New to Data School? Start here.
This page provides a categorized guide to Data School's blog posts, videos, courses, Jupyter notebooks, and webcast recordings.
- Featured content is highlighted in yellow.
- Paid content is marked with a 💲(everything else is 100% free!)
Learning data science
- How to launch your data science career (with Python): Step-by-step guide
- Can a dentist become a data scientist?: Thoughts about changing careers and what it takes to learn data science
- How to get better at data science: Advice for students, including tons of resources
- Lessons learned from teaching an 11-week data science course: Advice for data science educators
- Should you teach Python or R for data science?: Why I (mostly) prefer Python
- Should I learn pandas before scikit-learn?: Video to help you decide how you should spend your time
- How do I stay up-to-date as a data scientist?: Video listing the sites and people I recommend following
- Find the perfect dataset for your Data Science project: Curated list of sources for unique, high-quality datasets
Python
- 💲 Python Essentials for Data Scientists: 2-hour course that will get you up-to-speed quickly with Python's most important features (beginner level)
- Quick reference to Python: Reference guide to 15 core language features, with a script and Jupyter notebook
- How to use Python's f-strings with pandas: Brief introduction to f-strings
- Make your own private GPT with Python: Learn how to use the power of GPT to interact with your private documents
- Simulate the Monty Hall problem in Python: Solve a classic probability problem
- How to write a great Stack Overflow question: Step-by-step guide
- Write Pythonic Code for Better Data Science: Webcast recording
Python: pandas library (data analysis, manipulation, visualization)
- pandas in 30 days: 7-hour course covering the fundamentals of data analysis with pandas (beginner/intermediate level)
- My top 25 pandas tricks: Video (intermediate level), with a Jupyter notebook
- Master Python's pandas library with these 100 tricks: Tips and tricks that will save you time and energy every time you use pandas!
- Data science best practices with pandas: Tutorial recording (intermediate level), with a Jupyter notebook
- 💲 Analyzing Police Activity with pandas: Online DataCamp course (intermediate level) with 16 videos and 34 interactive coding exercises, structured like a "case study" in which you answer interesting questions about a real dataset
- How to merge DataFrames in pandas: Video (intermediate level), with a Jupyter notebook
- Should you use "dot notation" or "bracket notation" with pandas?: What's the best way to select a Series from a DataFrame?
- Working with Time Zones & Daylight Saving Time in pandas: Learn how to create "timezone-aware" data so that your dataset cooperates with Daylight Saving Time.
- Best practices with pandas: 10 videos (intermediate level), with a Jupyter notebook
- 9 new pandas updates that will save you time: 2 videos (intermediate level), with Jupyter notebooks
- What's the future of the pandas library?: Discussion of what's coming in version 1.0 and beyond
- Your pandas questions answered!: Webcast recording
- Top 8 resources for learning data analysis with pandas: Other recommended resources
- Visualization with pandas and Matplotlib: Lesson notebook demonstrating different types of plots
Python: Regular expressions
- Building a dataset of Python versions with regular expressions: Learn how to use pandas, requests, and regular expressions as part of a web scraping project
- 💲 Become a Regex Superhero: 3-hour course that will teach you how to solve tricky text problems with regular expressions
Python: Beautiful Soup library (web scraping)
- Web scraping the President's lies in 16 lines of Python: 4 videos (beginner level), with a Jupyter notebook
Python: scikit-learn library (machine learning)
- Introduction to Machine Learning with scikit-learn: 4-hour course covering the fundamentals of Machine Learning in Python (beginner level)
- 50 scikit-learn tips: 3-hour course covering assorted scikit-learn functionality and Machine Learning best practices (intermediate level)
- 💲 Master Machine Learning with scikit-learn: 7.5-hour course teaching you how to solve almost any supervised Machine Learning problem using the latest scikit-learn techniques (intermediate/advanced level)
- 💲 Machine Learning with Text in Python: 14-hour course covering Natural Language Processing, feature engineering, advanced machine learning techniques, and regular expressions (intermediate/advanced level)
- Machine Learning with Text: Tutorial recording (intermediate level), with Jupyter notebooks
- Answering 59 scikit-learn questions: Webcast recording
- How to prevent data leakage in pandas & scikit-learn: Learn about data leakage, why it's problematic, and how you can prevent it
- Should you discretize continuous features for Machine Learning?: Learn how to "discretize" or "bin" your continuous features using scikit-learn, and find out why I usually don't recommend doing so
- How to update your scikit-learn code for 2018: Guide to updating your scikit-learn code to be compatible with version 0.19.1
- How to encode categorical features with scikit-learn: Video covering how to "dummy" or "one-hot" encode categorical data, with a Jupyter notebook
- Using pandas with scikit-learn to create Kaggle submissions: Video covering the basic workflow for making predictions, with a Jupyter notebook
- Getting started with machine learning in Python: Webcast recording
Machine learning overview
- What is machine learning, and how does it work?: Video covering supervised and unsupervised learning, with a Jupyter notebook
- In-depth introduction to machine learning in 15 hours of expert videos: Videos from Stanford University professors, with slides, R code, and a free PDF book
- Getting started in scikit-learn with the famous iris dataset: Video covering machine learning terminology, with a Jupyter notebook
- How to create useful features for Machine Learning: Introduction to feature engineering
- How do I select features for machine learning?: Video covering feature selection tactics
- Comparing supervised learning algorithms: Comparison chart of eight common models, which is also available as a list of advantages and disadvantages
Machine learning models in Python
- K-nearest neighbors (KNN): Video using scikit-learn, with a Jupyter notebook
- Linear regression: Video using scikit-learn, pandas, and seaborn, with a Jupyter notebook
- A friendly introduction to linear regression: Lesson using Statsmodels and scikit-learn, with a Jupyter notebook
- Example of logistic regression in Python using scikit-learn: Example classification problem, with a Jupyter notebook
- Guide to an in-depth understanding of logistic regression: Lesson using scikit-learn, with a Jupyter notebook
- Decision trees for regression and classification: Lesson notebook using scikit-learn
- Ensembling, bagging, and Random Forests: Lesson notebook using scikit-learn
- Regularized regression and classification: Lesson notebook using scikit-learn
- K-means and DBSCAN clustering: Lesson notebook using scikit-learn
Machine learning model evaluation
- Comparing machine learning models: Video covering train/test split using scikit-learn, with a Jupyter notebook
- Cross-validation: Video covering parameter tuning, model selection, and feature selection using scikit-learn, with a Jupyter notebook
- Efficiently searching for optimal tuning parameters: Video covering grid search and randomized search using scikit-learn, with a Jupyter notebook
- Evaluating a classification model: Video covering classification metrics, confusion matrix, and ROC/AUC using scikit-learn, with a Jupyter notebook
- ROC curves and Area Under the Curve explained: Animated video with transcript
- Simple guide to confusion matrix terminology: Reference guide to classification metrics
- Making sense of the confusion matrix: Video covering the confusion matrix, including advanced topics
- Solve a medical mystery with a confusion matrix: Learn how to use a confusion matrix in a diagnostic scenario
- Comparing model evaluation procedures and metrics: Reference guide to three procedures and seven metrics
Other machine learning topics
- Quick reference guide to applying and interpreting linear regression: Step-by-step guide
- Exploring the bias-variance tradeoff: Lesson notebook using Seaborn
- Feature scaling/standardization: Lesson notebook using scikit-learn
- Dummy variables: Video using pandas, with a Jupyter notebook
- Handling missing values: Video using pandas, with a Jupyter notebook
Machine learning applications
- Detecting Fraudulent Skype Users via Machine Learning: Presentation with slides
- My first Kaggle competition: Allstate Purchase Prediction Challenge: Presentation with slides, video, and R code
R: Machine learning models
- Example of linear regression and regularization in R: Example using lasso and ridge regression, with an R Markdown document
R: dplyr package (data analysis, manipulation)
- Hands-on dplyr tutorial for faster data manipulation in R: Video (beginner level), with an R Markdown document
- Going deeper with dplyr: New features in 0.3 and 0.4: Video (beginner/intermediate level), with an R Markdown document
Git and GitHub (version control)
- Git and GitHub videos for beginners: 11 videos (beginner level)
- Step-by-step guide to contributing on GitHub: Learn how to make your first open source contribution
- Git quick reference for beginners: Reference guide to common commands
- Simple guide to forks in GitHub and Git: Visual guide to forking, syncing, and pull requests
- GitHub is "just" Dropbox for Git: High-level explanation of Git and GitHub
Jupyter notebook (coding environment)
- Jupyter & IPython terminology explained: Guide to the differences between Jupyter Notebook, JupyterLab, IPython, Colab, and other related terms
- Fly through Jupyter with keyboard shortcuts: Visual guide to the 25 most useful shortcuts for Jupyter Notebook and JupyterLab
- Six easy ways to run your Jupyter Notebook in the cloud: Comparison of free cloud services for running the Jupyter Notebook (no installation required).
- Setting up Python for machine learning: Video covering basic usage of the Jupyter notebook (formerly called the IPython notebook)
- Reproducibility is not just for researchers: Explains why the Jupyter notebook is a critical tool for data analysis
conda (package and environment manager)
- What are conda, Anaconda, and Miniconda?: Explains the relationship between conda and the Anaconda/Miniconda distributions
- Get started with conda environments: Discover the benefits of conda's virtual environments and learn the six commands you need to know to get started
Other
- Why I offer location-based pricing for my courses: I offer a discount on my courses (up to 85%) to people in 160+ countries, to account for the differences in "purchasing power" across the globe.
Want to be notified about new Data School content? Please subscribe to the email newsletter: