November 12, 2019 · Python tutorial machine learning

How to encode categorical features with scikit-learn (video)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

If you want to follow along with the code, you can download the Jupyter notebook from GitHub.

Click on a timestamp below to jump to a particular section:

0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?

Related Resources

P.S. Want to master Machine Learning in Python? Enroll in my online course, Building an Effective Machine Learning Workflow with scikit-learn!

Comments powered by Disqus