May 23, 2019

Data science best practices with pandas (video tutorial)

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.

In this in-depth tutorial, which I presented at PyCon 2019, you'll use pandas to answer questions about a real-world dataset. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions.

"This tutorial is one of the best courses on pandas, if not the best, especially for people who don't have advanced level in pandas. I have been using pandas for some time but I discovered things in this course that were amazing for me." - S.R.

This is an intermediate level tutorial, so if you're new to pandas, I recommend starting with my other video series: Easier data analysis with pandas.

If you want to follow along with the exercises at home, you can download the dataset and notebook from GitHub.

Here are some of the topics covered in the video:

adjusting for bias in your dataset
handling missing values
choosing an appropriate plot
customizing your plot
using the datetime data type
filtering using loc versus query
using multiple aggregation functions
checking for small sample sizes
method chaining
verifying your results using random samples
evaluating a "stringifed" Python container
applying a custom function to a Series
writing lambda functions

Let me know if you have any questions, and I'm happy to answer them!

P.S. If you like this video, you should check out my free course, pandas in 30 days.

New? Start here!

Log in / Sign up for courses

Get weekly tips 💌

About Data School

Data science best practices with pandas (video tutorial)