May 23, 2018 · Python tutorial

Best practices with pandas (video series)

At the PyCon 2018 conference, I presented a tutorial called "Using pandas for Better (and Worse) Data Science". Through a series of exercises, I demonstrated best practices with pandas to help students become more fluent at using pandas to answer data science questions and avoid data science errors.

I split the tutorial into 10 videos. The first video introduces the tutorial and the dataset, and the other nine videos contain the exercises we discuss. I recommend that you watch the videos in order:

  1. Introducing the dataset (19:40)
  2. Removing columns (6:27)
  3. Comparing groups (8:42)
  4. Examining relationships (8:44)
  5. Handling missing values (5:02)
  6. Using string methods (5:55)
  7. Combining dates and times (9:11)
  8. Plotting a time series (8:48)
  9. Creating useful plots (8:47)
  10. Fixing bad data (16:31)

If you want to follow along with the exercises at home, you can download the dataset and code from GitHub. The dataset was collected by the Stanford Open Policing Project, and includes a decade of traffic stop data from the state of Rhode Island.

This is an intermediate tutorial, so if you're brand new to pandas, I recommend that you start with my other video series, Easier data analysis in Python with pandas.

Please enjoy the series, and I hope to hear from you in the comments section!

Embedded videos with descriptions

1. Introducing the dataset (19:40)

This video covers the following topics: reading a CSV file, DataFrame shape, data types, NaN, missing values, booleans.

2. Removing columns (6:27)

This video covers the following topics: missing values, dropping a column, axis parameter, inplace parameter, dropna method.

3. Comparing groups (8:42)

This video covers the following topics: filtering a DataFrame, value_counts method, normalization, groupby method.

4. Examining relationships (8:44)

This video covers the following topics: value_counts method, math with booleans, groupby with multiple columns, correlation versus causation.

5. Handling missing values (5:02)

This video covers the following topics: math with booleans, value_counts method, filtering a DataFrame, dropna parameter.

6. Using string methods (5:55)

This video covers the following topics: searching strings, math with booleans, value_counts method, dropna parameter.

7. Combining dates and times (9:11)

This video covers the following topics: string slicing, string concatenation, converting to datetime format, datetime attributes, value_counts method.

8. Plotting a time series (8:48)

This video covers the following topics: math with booleans, groupby method, datetime attributes, line plots.

9. Creating useful plots (8:47)

This video covers the following topics: datetime attributes, value_counts method, line plots, sorting, groupby method.

10. Fixing bad data (16:31)

This video covers the following topics: value_counts method, filtering by multiple conditions, missing values, NaN, loc accessor, SettingWithCopyWarning.

P.S. If you liked this video series, I recommend checking out my tutorial from PyCon 2019, Data science best practices with pandas!

Comments powered by Disqus