September 7, 2015 · Python machine learning

How to get better at data science

Recently, I finished teaching General Assembly's 11-week data science course for the fourth time. The goal of the course is to enable students to apply the entire data science workflow (using Python) to problems that interest them: forming a question, gathering and cleaning data, exploring and visualizing the data, building and evaluating machine learning models, and communicating results. The typical student is a working professional with some experience working with data, limited programming experience, and basic statistical knowledge. (Here are the course materials, here's why we teach Python, and here were my lessons learned from the first time I taught the course.)

At the end of every course, the most common question I receive from students is this:

How can I continue to improve my data science skills?

Below is the advice I give to my students. How would you answer this question?

My advice to students

Here is my best advice for getting better at data science: Find "the thing" that motivates you to practice what you learned and to learn more, and then do that thing. That could be personal data science projects, Kaggle competitions, online courses, reading books, reading blogs, attending meetups or conferences, or something else.

If you create your own data science projects, I'd encourage you to share them on GitHub and include writeups. That will help to show others that you know how to do proper data science.

Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition. Spend as much time as possible reading the forums, because you'll learn a lot, but don't spend time in the forums at the expense of working on the competition yourself. Also, keep in mind that you won't be practicing important parts of the data science workflow, namely generating questions, gathering data, and communicating results.

There are many online courses to consider, and new ones being created all the time:

Here is just a tiny selection of books:

There are an overwhelming number of data science blogs and newsletters. If you want to read just one site, DataTau is the best aggregator. Data Elixir is the best newsletter, though the O'Reilly Data Newsletter and Python Weekly are also good. Other notable blogs include: no free hunch (Kaggle's blog), The Yhat blog (lots of Python and R content), Practical Business Python (accessible Python content), Simply Statistics (a bit more academic), FastML (machine learning content), Win-Vector blog (great data science advice), FiveThirtyEight (data journalism), and Data School (my blog).

If you prefer podcasts, I don't have any personal recommendations, though this list gives a nice summary of seven data science podcasts that the author recommends.

Some notable data science conferences are KDD, Strata, PyCon, PyData, and SciPy. (You should also search for data-related meetups in your local community!)

If you want to go full-time with your data science education, read this guide to data science bootcamps, and this other guide which also includes part-time and online programs. Or, check out this massive list of colleges and universities with data science-related degrees.

What's your advice?

I'd love to hear from you in the comments, whether it's to share an additional resource or piece of advice, to discuss one of my recommendations, or just to let me know that you found something useful here!

P.S. Want to take an online data science course taught by me? Please subscribe to the Data School newsletter to gain priority access to my upcoming courses!

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pocket
Comments powered by Disqus