# New to Data School? Start here.

This page provides a categorized guide to Data School's **blog posts, videos, courses, Jupyter notebooks, and webcast recordings**.

**Featured content**is highlighted in yellow.**Paid content**is marked with a ðŸ’²(everything else is**100% free!**)

If you want to be notified about new Data School content, please subscribe to the email newsletter.

If you want to support Data School and gain access to additional educational content, please join Data School Insiders on Patreon.

### Learning data science

- How to launch your data science career (with Python): Step-by-step guide
- Can a dentist become a data scientist?: Thoughts about changing careers and what it takes to learn data science
- How to get better at data science: Advice for students, including tons of resources
- Lessons learned from teaching an 11-week data science course: Advice for data science educators
- Should you teach Python or R for data science?: Why I (mostly) prefer Python
- Should I learn pandas before scikit-learn?: Video to help you decide how you should spend your time
- How do I stay up-to-date as a data scientist?: Video listing the sites and people I recommend following

### Python

- Quick reference to Python: Reference guide to 15 core language features, with a script and Jupyter notebook
- Write Pythonic Code for Better Data Science: Webcast recording
- Best resources for "going deeper" with Python: Recommended resources for advancing your Python skills

### Python: pandas library (data analysis, manipulation, visualization)

- Easier data analysis in Python with pandas: 30 videos (beginner/intermediate level), with a Jupyter notebook
- Best practices with pandas: 10 videos (intermediate level), with a Jupyter notebook
- ðŸ’² Analyzing Police Activity with pandas: Online DataCamp course (intermediate level) with 16 videos and 34 interactive coding exercises, structured like a "case study" in which you answer interesting questions about a real dataset
- 9 new pandas updates that will save you time: 2 videos (intermediate level), with Jupyter notebooks
- What's the future of the pandas library?: Discussion of what's coming in version 1.0 and beyond
- Your pandas questions answered!: Webcast recording
- Top 8 resources for learning data analysis with pandas: Other recommended resources
- Visualization with pandas and Matplotlib: Lesson notebook demonstrating different types of plots
- Merging DataFrames: Lesson notebook demonstrating four types of joins

### Python: Beautiful Soup library (web scraping)

- Web scraping the President's lies in 16 lines of Python: 4 videos (beginner level), with a Jupyter notebook

### Python: scikit-learn library (machine learning)

- Introduction to machine learning in Python with scikit-learn: 9 videos (beginner/intermediate level), with Jupyter notebooks
- Machine Learning with Text (tutorial): Tutorial recording (intermediate level), with Jupyter notebooks
- ðŸ’² Machine Learning with Text (course): Data School's paid online course covering Natural Language Processing, feature engineering, advanced machine learning techniques, and regular expressions (intermediate/advanced level)
- How to update your scikit-learn code for 2018: Guide to updating your scikit-learn code to be compatible with version 0.19.1
- Using pandas with scikit-learn to create Kaggle submissions: Video covering the basic workflow for making predictions, with a Jupyter notebook
- Getting started with machine learning in Python: Webcast recording

### Machine learning overview

- What is machine learning, and how does it work?: Video covering supervised and unsupervised learning, with a Jupyter notebook
- Getting started in scikit-learn with the famous iris dataset: Video covering machine learning terminology, with a Jupyter notebook
- In-depth introduction to machine learning in 15 hours of expert videos: Videos from Stanford University professors, with slides, R code, and a free PDF book
- How to create useful features for Machine Learning: Introduction to feature engineering
- How do I select features for machine learning?: Video covering feature selection tactics
- Comparing supervised learning algorithms: Comparison chart of eight common models, which is also available as a list of advantages and disadvantages

### Machine learning models in Python

- K-nearest neighbors (KNN): Video using scikit-learn, with a Jupyter notebook
- Linear regression: Video using scikit-learn, pandas, and seaborn, with a Jupyter notebook
- A friendly introduction to linear regression: Lesson using Statsmodels and scikit-learn, with a Jupyter notebook
- Example of logistic regression in Python using scikit-learn: Example classification problem, with a Jupyter notebook
- Guide to an in-depth understanding of logistic regression: Lesson using scikit-learn, with a Jupyter notebook
- Decision trees for regression and classification: Lesson notebook using scikit-learn
- Ensembling, bagging, and Random Forests: Lesson notebook using scikit-learn
- Regularized regression and classification: Lesson notebook using scikit-learn
- K-means and DBSCAN clustering: Lesson notebook using scikit-learn

### Machine learning model evaluation

- Comparing machine learning models: Video covering train/test split using scikit-learn, with a Jupyter notebook
- Cross-validation: Video covering parameter tuning, model selection, and feature selection using scikit-learn, with a Jupyter notebook
- Efficiently searching for optimal tuning parameters: Video covering grid search and randomized search using scikit-learn, with a Jupyter notebook
- Evaluating a classification model: Video covering classification metrics, confusion matrix, and ROC/AUC using scikit-learn, with a Jupyter notebook
- ROC curves and Area Under the Curve explained: Animated video with transcript
- Simple guide to confusion matrix terminology: Reference guide to classification metrics
- Making sense of the confusion matrix: Video covering the confusion matrix, including advanced topics
- Comparing model evaluation procedures and metrics: Reference guide to three procedures and seven metrics

### Other machine learning topics

- Quick reference guide to applying and interpreting linear regression: Step-by-step guide
- Exploring the bias-variance tradeoff: Lesson notebook using Seaborn
- Feature scaling/standardization: Lesson notebook using scikit-learn
- Dummy variables: Video using pandas, with a Jupyter notebook
- Handling missing values: Video using pandas, with a Jupyter notebook

### Machine learning applications

- Detecting Fraudulent Skype Users via Machine Learning: Presentation with slides
- My first Kaggle competition: Allstate Purchase Prediction Challenge: Presentation with slides, video, and R code

### R: Machine learning models

- Example of linear regression and regularization in R: Example using lasso and ridge regression, with an R Markdown document

### R: dplyr package (data analysis, manipulation)

- Hands-on dplyr tutorial for faster data manipulation in R: Video (beginner level), with an R Markdown document
- Going deeper with dplyr: New features in 0.3 and 0.4: Video (beginner/intermediate level), with an R Markdown document

### Git and GitHub (version control)

- Git and GitHub videos for beginners: 11 videos (beginner level)
- Git quick reference for beginners: Reference guide to common commands
- Simple guide to forks in GitHub and Git: Visual guide to forking, syncing, and pull requests
- GitHub is "just" Dropbox for Git: High-level explanation of Git and GitHub

### Jupyter notebook (coding environment)

- Six easy ways to run your Jupyter Notebook in the cloud: Comparison of free cloud services for running the Jupyter Notebook (no installation required).
- Setting up Python for machine learning: Video covering basic usage of the Jupyter notebook (formerly called the IPython notebook)
- IPython Notebook is now called Jupyter Notebook: Briefly explains the history of the Jupyter notebook
- Reproducibility is not just for researchers: Explains why the Jupyter notebook is a critical tool for data analysis