Picking a Topic
25 Apr 2020In I mentioned in my previous post, projects are important aspiring data scientists as way to gain experience and exhibit their skills. A question naturally arises from the need to do self-direct projects: What topics are appropriate for such projects? If you are looking to enhance your resume and your Github profile, you do not have constraints from your non-existent manager or client. While this freedom seems wonderful at first, many of my mentees have found it to be overwhelming.

To guide my mentees, I always advise them to pick a topic that is important to them on a personal level. In comparison to the standard topics that can be found on Kaggle and other data science focused sites, there are several advantages:
1. Originality
Anyone who has been around data science long enough would have seen Twitter sentiment analysis or classifying images of cats and dogs a few dozen times. I am personally tired of seeing the same old capstone projects over and over again. While one could always put a new spin on an old topic, this is a difficult task for an inexperienced data scientist.
On the other hand, a topic of personal interest is unlikely to be a retread of the tired old themes in data science. Given fierce competition in the data science job market, any opportunity to stand out from the crowd is good. As I mentioned in the previous post, the purpose of the capstone project is to hone and show off the data science skills. Without constraints from prior works, one could explore the topic freely using any technique and thus able to show off one’s best self.
2. Passion
In all the data science interviews that I have been involved in, both as an interviewee and as an interviewer, there was always a presentation component. It can be difficult to make these presentations engaging because the content are mostly determined by industry standards. In each presentation, one must include an introduction of the topic and the data set, an exploratory data analysis, a statistical or machine learning model, and finally an analysis of the results from the model. Given the mechanical nature of these presentations, a personal passion on the topic will shine through. Actually caring about the topic would inject excitement that is not commonly found in corporate presentations. This small difference may help you stand out from other candidates.
3. Expertise
As a burgeoning data scientist, one is unlikely to be the expert on the typical capstone project during the interview process. In fact, the interviewers would likely have more expertise on both the topic and the techniques involved. In my experience, it is difficult to present to people who are more knowledgeable than me. While you cannot change the fact that the interviewers are technically more proficient, you could at least speak with authority on the topic if it is of your personal interest.
Final Thoughts
Picking a topic based on personal interest is a way to take advantage of the freedom of not having anyone to answer to. I have shown the numerous advantages in picking a project topic that can show-off your passion and knowledge. For the next step in a data science project, there has to be a data set. In my next post, I will discuss how to pick an appropriate data set.