Things were a little different when I wrote this in 2017. These days I constantly see new/junior data scientists get rejected because they don’t have the experience. Even those who have an impressive portfolio of projects to show off that they have the technical know-how get thumbs down. I firmly believe this is a failure of employers, not the new generation of recently graduated data scientists entering the field.
Despite the wealth of information out there about the ways in which data science can bring value to an organization (e.g. What Data Scientists Really Do, According to 35 Data Scientists by Hugo Bowne-Anderson) and what information architecture is required to make that happen, employers are hiring senior data scientists (not always at a senior salary) because they feel like that excuses them from providing guidance, direction, and support. Those data scientists then have to find ways to make improvements and impact while also building the data infrastructure themselves (or trying to convince higher-ups to give them money to hire dedicated data engineers).
All of this to say: it’s an immensely shitty situation and I’m sorry your (often very impressive!) resumes are being passed on simply because you haven’t been doing this for 5+ years. So please ignore everything below the line and instead head over to Vicki Boykis’s Data science is different now post where she suggests next steps for you:
- Don’t shoot for a data science job
- Be prepared for most of your data scientist work to not be data science. Adjust your skillset for that.
She explains them in depth in the post, so – again – I encourage you to read it yourself.
Getting into a technical field like data science is really difficult when you’re fresh out of school. On the off-chance that your potential employer actually gets the hiring process right, most organizations are still going to place a considerable amount of weight on experience over schooling. Like, yeah there are certain schools that make it a lot easier to go from academia to industry, but otherwise you’re dealing with the classic catch-22 situation.
Something that can help you – and what I would notice when reviewing applications – is having something original and interesting (even if just to you) to show and talk about. It doesn’t have to be published original research. It doesn’t have to be a thesis. It just has to show that you can:
- Work with real data: In most academic programs, methods are taught using clean, ready-to-use data. So it’s important to show that you can take some data you found somewhere and process into something that you can glean insights from. It also gives you a chance to work with data about a topic that you personally find interesting. Possible sources of data include:
- Explore it: Once you have a dataset that actually excites you, you should perform some EDA. Produce at least one (thoroughly labeled) visualization that shows some interesting pattern or relationship. I want to see your curiosity. I want to see an understanding that you can’t just jump into model-fitting without developing some familiarity with your data first.
- Analyze it: You’re going to lose a lot of interest if you just show and talk about how you followed the steps of some tutorial verbatim. If you learn from the tutorial and then apply that methodology to a different dataset, that’s basically what “experience” means. And don’t try to use an overly complicated algorithm/model if the goal doesn’t require it. You might get incredible accuracy classifying with deep learning, but you’ll probably have a more interesting story to tell from inference with a logistic regression. Heck, at Wikimedia we use that in our anti-harassment research.
- Present your work: It can be a neat report with an executive summary (abstract) or it can be an interactive visualization or a slide deck. Just something better than zip archive of scripts or Jupyter notebooks.
- Explain your work (however complex) and results in a way that can be understood: This is where the first point is really important. If you’re describing your analysis of data from a topic you’re familiar with and are interested in, you’re going to have a much easier time explaining it to a stranger. Be prepared to talk about it to a non-technical person. Be prepared to talk about it to a technical person who may not be familiar with your particular methodology. Your interviewer may have done a lot of computational lingustics & NLP but no survival analysis, so get ready to give a brief lesson on K-M curves (and vice versa).
- Perform an analysis from start to finish: Because that’s what we look for when we assign a take-home task to our candidates.
A lot of times the job postings will include a number of years as a requirement, but that’s not as need-to-have as you or they might think. Secretely, it’s actually a nice-to-have because “experience” is mostly a proxy for “candidate has previously used real data to solve a problem in a way that can be understood and used to inform a decision-making process.” If you don’t have experience, you can still demonstrate that you’ve done what a data scientist does.
I would like to thank Angela Bassa (Director of Data Science at iRobot) for her input on this post. In particular, the last paragraph is based entirely on her suggestions. She also created the Data Helpers website that lists data professionals who are able to answer questions, promote, or mentor newcomers into the field._