November 12, 2019 · Python tutorial machine learning

How to encode categorical features with scikit-learn (video)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

If you want to follow along with the code, you can download the Jupyter notebook from GitHub.

Click on a timestamp below to jump to a particular section:

0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?

Related Resources

P.S. Want to master Machine Learning in Python? Enroll in my online course, Machine Learning with Text in Python!

Comments powered by Disqus