Top content from two years of Data School
This month, Data School turns two years old! To celebrate, I wanted to share some of the key milestones with you, to give you a sense of where we've been and where we're going:
2014
- March: While learning version control, I get frustrated by the lack of clear and accessible information about some important Git and GitHub concepts. Once I figure it out, I write the "missing page" of GitHub's documentation about forks and pull requests.
- April: Coursera invites me to be a Teaching Assistant for "The Data Scientist's Toolbox," the first course in their Data Science Specialization. I decide that the course videos leave out some essential information about Git, and so I create a 36-minute video series to share with the students. I return as a volunteer Teaching Assistant for the next 16 course sessions, and my videos are viewed 350,000 times.
- August: As an Expert in Residence for the Data Science course at General Assembly (GA), I teach a lesson on how to use R's dplyr package for data exploration and manipulation. I decide that a wider audience would benefit from the lesson, and record a longer tutorial for YouTube. (I record a follow-up tutorial in March 2015, and both tutorials are later featured by Kaggle.)
- September: I realize that the excellent videos from Stanford's Statistical Learning course are on YouTube, but are nearly impossible to find. I catalog the videos on my blog, and come up with a "must-click" title: In-depth introduction to machine learning in 15 hours of expert videos. It remains my most viewed post (and most popular tweet), and has been on the R-bloggers list of "most visited articles of the week" every week for the past 18 months.
- November: I'm now an Instructor for GA's Data Science course, and I teach a lesson on the challenging topic of ROC curves and Area Under the Curve. I convert that lesson into my first (and only) animated video, which later becomes surprisingly popular.
- December: I finish teaching the Data Science course, and publish the 66 hours of course content on GitHub so that others can benefit. (It's still my most popular GitHub repository, though truthfully, the latest version of the repository is much more refined.)
2015
- January: I publish a 4000-word essay on data science instruction. A commenter asks a question about teaching Python or R for data science, and I respond with a (controversial) post, which garners some incredibly thoughtful debate in the comments section.
- February: Kaggle invites me to guest blog for them on a topic of my choosing. I volunteer to create a series of video tutorials on machine learning using Python's scikit-learn. During the eight months that follow, I spend hundreds of hours creating a 4-hour video series with companion blog posts for Kaggle. The Data School blog is very quiet during this time! :)
- October: I launch my first live online course, Machine Learning with Text in Python, in order to provide a classroom-like experience to students worldwide who are unable to attend my in-person courses.
- December: Google begins using my definition of "confusion matrix" in the snippet at the top of their search results, quoting from my simple guide to confusion matrix terminology.
2016
- March: I announce another session of Machine Learning with Text in Python starting April 9, having expanded the course content from 6 hours to 15 hours. (If you're interested in the course, you can watch a video Q&A for more information.)
So what's next for Data School? My plan is to launch additional online courses this year, while continuing to create lots of new content for the Data School blog and YouTube channel. I've got lots of exciting ideas for what to create next, but feel free to let me know your suggestions in the comments section!
How can you help? It would mean a lot to me if you would take 60 seconds right now and share this page with a friend, colleague, or group who might be interested in Data School. By growing the Data School community, you're enabling me to focus full-time on Data School and create more high-quality content.
Thanks so much for being part of the Data School community. Here's to many more great years ahead!