Advice for graduates applying for data science jobs

Getting into a technical field like data science is really difficult when you're fresh out of school. On the off-chance that your potential employer actually gets the hiring process right, most organizations are still going to place a considerable amount of weight on experience over schooling. Like, yeah there are certain schools that make it a lot easier to go from academia to industry, but otherwise you're dealing with the classic catch-22 situation.

Something that can help you – and what I would notice when reviewing applications – is having something original and interesting (even if just to you) to show and talk about. It doesn't have to be published original research. It doesn't have to be a thesis. It just has to show that you can:

  • Work with real data: In most academic programs, methods are taught using clean, ready-to-use data. So it's important to show that you can take some data you found somewhere and process into something that you can glean insights from. It also gives you a chance to work with data about a topic that you personally find interesting. Possible sources of data include:
  • Explore it: Once you have a dataset that actually excites you, you should perform some EDA. Produce at least one (thoroughly labeled) visualization that shows some interesting pattern or relationship. I want to see your curiosity. I want to see an understanding that you can't just jump into model-fitting without developing some familiarity with your data first.
  • Analyze it: You're going to lose a lot of interest if you just show and talk about how you followed the steps of some tutorial verbatim. If you learn from the tutorial and then apply that methodology to a different dataset, that's basically what "experience" means. And don't try to use an overly complicated algorithm/model if the goal doesn't require it. You might get incredible accuracy classifying with deep learning, but you'll probably have a more interesting story to tell from inference with a logistic regression. Heck, at Wikimedia we use that in our anti-harassment research.
  • Present your work: It can be a neat report with an executive summary (abstract) or it can be an interactive visualization or a slide deck. Just something better than zip archive of scripts or Jupyter notebooks.
  • Explain your work (however complex) and results in a way that can be understood: This is where the first point is really important. If you're describing your analysis of data from a topic you're familiar with and are interested in, you're going to have a much easier time explaining it to a stranger. Be prepared to talk about it to a non-technical person. Be prepared to talk about it to a technical person who may not be familiar with your particular methodology. Your interviewer may have done a lot of computational lingustics & NLP but no survival analysis, so get ready to give a brief lesson on K-M curves (and vice versa).
  • Perform an analysis from start to finish: Because that's what we look for when we assign a take-home task to our candidates.

A lot of times the job postings will include a number of years as a requirement, but that's not as need-to-have as you or they might think. Secretely, it's actually a nice-to-have because "experience" is mostly a proxy for "candidate has previously used real data to solve a problem in a way that can be understood and used to inform a decision-making process." If you don't have experience, you can still demonstrate that you've done what a data scientist does.

Good luck~

Acknowledgement: I would like to thank Angela Bassa (Director of Data Science at iRobot) for her input on this post. In particular, the last paragraph is based entirely on her suggestions. She also created the Data Helpers website that lists data professionals who are able to answer questions, promote, or mentor newcomers into the field.