Data science best practices with pandas (video tutorial)
The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.
In this in-depth tutorial, which I presented at PyCon 2019, you'll use pandas to answer questions about a real-world dataset. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions.
"This tutorial is one of the best courses on pandas, if not the best, especially for people who don't have advanced level in pandas. I have been using pandas for some time but I discovered things in this course that were amazing for me." - S.R.
This is an intermediate level tutorial, so if you're new to pandas, I recommend starting with my other video series: Easier data analysis with pandas.
If you want to follow along with the exercises at home, you can download the dataset and notebook from GitHub.
Here are some of the topics covered in the video:
- adjusting for bias in your dataset
- handling missing values
- choosing an appropriate plot
- customizing your plot
- using the datetime data type
- filtering using loc versus query
- using multiple aggregation functions
- checking for small sample sizes
- method chaining
- verifying your results using random samples
- evaluating a "stringifed" Python container
- applying a custom function to a Series
- writing lambda functions
Let me know if you have any questions, and I'm happy to answer them!
P.S. If you like this video, you should check out my interactive pandas course, Analyzing Police Activity with pandas.