# New to Data School? Start here.

This page provides a categorized guide to Data School's **blog posts, videos, courses, Jupyter notebooks, and webcast recordings**.

**Featured content**is highlighted in yellow.**Paid content**is marked with a ðŸ’²(everything else is**100% free!**)

### Learning data science

- How to launch your data science career (with Python): Step-by-step guide
- How to get better at data science: Advice for students, including tons of resources
- Lessons learned from teaching an 11-week data science course: Advice for data science educators
- Should you teach Python or R for data science?: Why I (mostly) prefer Python
- Should I learn pandas before scikit-learn?: Video to help you decide how you should spend your time
- How do I stay up-to-date as a data scientist?: Video listing the sites and people I recommend following

### Python

- Quick reference to Python: Reference guide to 15 core language features, with a script and Jupyter notebook
- Write Pythonic Code for Better Data Science: Webcast recording
- Best resources for "going deeper" with Python: Recommended resources for advancing your Python skills

### Python: pandas library (data analysis, manipulation, visualization)

- Easier data analysis in Python with pandas: 37 videos (beginner/intermediate level), with a Jupyter notebook
- My top 25 pandas tricks: Video (intermediate level), with a Jupyter notebook
- Master Python's pandas library with these 100 tricks: Tips and tricks that will save you time and energy every time you use pandas!
- Data science best practices with pandas: Tutorial recording (intermediate level), with a Jupyter notebook
- ðŸ’² Analyzing Police Activity with pandas: Online DataCamp course (intermediate level) with 16 videos and 34 interactive coding exercises, structured like a "case study" in which you answer interesting questions about a real dataset
- How to merge DataFrames in pandas: Video (intermediate level), with a Jupyter notebook
- Should you use "dot notation" or "bracket notation" with pandas?: What's the best way to select a Series from a DataFrame?
- Best practices with pandas: 10 videos (intermediate level), with a Jupyter notebook
- 9 new pandas updates that will save you time: 2 videos (intermediate level), with Jupyter notebooks
- What's the future of the pandas library?: Discussion of what's coming in version 1.0 and beyond
- Your pandas questions answered!: Webcast recording
- Top 8 resources for learning data analysis with pandas: Other recommended resources
- Visualization with pandas and Matplotlib: Lesson notebook demonstrating different types of plots

### Python: Beautiful Soup library (web scraping)

- Web scraping the President's lies in 16 lines of Python: 4 videos (beginner level), with a Jupyter notebook

### Python: scikit-learn library (machine learning)

- Introduction to machine learning in Python with scikit-learn: 10 videos (beginner/intermediate level), with Jupyter notebooks
- Machine Learning with Text (tutorial): Tutorial recording (intermediate level), with Jupyter notebooks
- ðŸ’² Machine Learning with Text (course): Data School's paid online course covering Natural Language Processing, feature engineering, advanced machine learning techniques, and regular expressions (intermediate/advanced level)
- How to update your scikit-learn code for 2018: Guide to updating your scikit-learn code to be compatible with version 0.19.1
- How to encode categorical features with scikit-learn: Video covering how to "dummy" or "one-hot" encode categorical data, with a Jupyter notebook
- Using pandas with scikit-learn to create Kaggle submissions: Video covering the basic workflow for making predictions, with a Jupyter notebook
- Getting started with machine learning in Python: Webcast recording

### Machine learning overview

- What is machine learning, and how does it work?: Video covering supervised and unsupervised learning, with a Jupyter notebook
- In-depth introduction to machine learning in 15 hours of expert videos: Videos from Stanford University professors, with slides, R code, and a free PDF book
- Getting started in scikit-learn with the famous iris dataset: Video covering machine learning terminology, with a Jupyter notebook
- How to create useful features for Machine Learning: Introduction to feature engineering
- How do I select features for machine learning?: Video covering feature selection tactics
- Comparing supervised learning algorithms: Comparison chart of eight common models, which is also available as a list of advantages and disadvantages

### Machine learning models in Python

- K-nearest neighbors (KNN): Video using scikit-learn, with a Jupyter notebook
- Linear regression: Video using scikit-learn, pandas, and seaborn, with a Jupyter notebook
- A friendly introduction to linear regression: Lesson using Statsmodels and scikit-learn, with a Jupyter notebook
- Example of logistic regression in Python using scikit-learn: Example classification problem, with a Jupyter notebook
- Guide to an in-depth understanding of logistic regression: Lesson using scikit-learn, with a Jupyter notebook
- Decision trees for regression and classification: Lesson notebook using scikit-learn
- Ensembling, bagging, and Random Forests: Lesson notebook using scikit-learn
- Regularized regression and classification: Lesson notebook using scikit-learn
- K-means and DBSCAN clustering: Lesson notebook using scikit-learn

### Machine learning model evaluation

- Comparing machine learning models: Video covering train/test split using scikit-learn, with a Jupyter notebook
- Cross-validation: Video covering parameter tuning, model selection, and feature selection using scikit-learn, with a Jupyter notebook
- Efficiently searching for optimal tuning parameters: Video covering grid search and randomized search using scikit-learn, with a Jupyter notebook
- Evaluating a classification model: Video covering classification metrics, confusion matrix, and ROC/AUC using scikit-learn, with a Jupyter notebook
- ROC curves and Area Under the Curve explained: Animated video with transcript
- Simple guide to confusion matrix terminology: Reference guide to classification metrics
- Making sense of the confusion matrix: Video covering the confusion matrix, including advanced topics
- Comparing model evaluation procedures and metrics: Reference guide to three procedures and seven metrics

### Other machine learning topics

- Quick reference guide to applying and interpreting linear regression: Step-by-step guide
- Exploring the bias-variance tradeoff: Lesson notebook using Seaborn
- Feature scaling/standardization: Lesson notebook using scikit-learn
- Dummy variables: Video using pandas, with a Jupyter notebook
- Handling missing values: Video using pandas, with a Jupyter notebook

### Machine learning applications

- Detecting Fraudulent Skype Users via Machine Learning: Presentation with slides
- My first Kaggle competition: Allstate Purchase Prediction Challenge: Presentation with slides, video, and R code

### R: Machine learning models

- Example of linear regression and regularization in R: Example using lasso and ridge regression, with an R Markdown document

### R: dplyr package (data analysis, manipulation)

- Hands-on dplyr tutorial for faster data manipulation in R: Video (beginner level), with an R Markdown document
- Going deeper with dplyr: New features in 0.3 and 0.4: Video (beginner/intermediate level), with an R Markdown document

### Git and GitHub (version control)

- Git and GitHub videos for beginners: 11 videos (beginner level)
- Git quick reference for beginners: Reference guide to common commands
- Simple guide to forks in GitHub and Git: Visual guide to forking, syncing, and pull requests
- GitHub is "just" Dropbox for Git: High-level explanation of Git and GitHub

### Jupyter notebook (coding environment)

- Six easy ways to run your Jupyter Notebook in the cloud: Comparison of free cloud services for running the Jupyter Notebook (no installation required).
- Setting up Python for machine learning: Video covering basic usage of the Jupyter notebook (formerly called the IPython notebook)
- IPython Notebook is now called Jupyter Notebook: Briefly explains the history of the Jupyter notebook
- Reproducibility is not just for researchers: Explains why the Jupyter notebook is a critical tool for data analysis

Want to be notified about **new Data School content**? Please subscribe to the email newsletter: