# New to Data School? Start here.

This page provides a categorized guide to Data School's **blog posts, videos, courses, Jupyter notebooks, and webcast recordings**.

Featured content is **highlighted in yellow.**

If you want to be notified about new Data School content, please subscribe to the email newsletter.

### Learning data science

- How to launch your data science career (with Python): Step-by-step guide
- How to get better at data science: Advice for students, including tons of resources
- Lessons learned from teaching an 11-week data science course: Advice for data science educators
- Should you teach Python or R for data science?: Why I (mostly) prefer Python

### Python

- Quick reference to Python: Reference guide to 15 core language features, with a script and Jupyter notebook
- Write Pythonic Code for Better Data Science: Webcast recording

### Python: pandas library (data analysis, manipulation, visualization)

- Easier data analysis in Python with pandas: 30 videos (beginner/intermediate level), with a Jupyter notebook
- Best practices with pandas: 10 videos (intermediate level), with a Jupyter notebook
- 9 new pandas updates that will save you time: 2 videos (intermediate level), with Jupyter notebooks
- Your pandas questions answered!: Webcast recording
- Top 8 resources for learning data analysis with pandas: Other recommended resources
- Visualization with pandas and Matplotlib: Lesson notebook demonstrating different types of plots
- Merging DataFrames: Lesson notebook demonstrating four types of joins

### Python: Beautiful Soup library (web scraping)

- Web scraping the President's lies in 16 lines of Python: 4 videos (beginner level), with a Jupyter notebook

### Python: scikit-learn library (machine learning)

- Introduction to machine learning in Python with scikit-learn: 9 videos (beginner/intermediate level), with Jupyter notebooks
- Machine Learning with Text (tutorial): Tutorial recording (intermediate level), with Jupyter notebooks
- Machine Learning with Text (course): Data School's paid online course covering Natural Language Processing, feature engineering, advanced machine learning techniques, and regular expressions (intermediate/advanced level)
- Using pandas with scikit-learn to create Kaggle submissions: Video covering the basic workflow for making predictions, with a Jupyter notebook
- Getting started with machine learning in Python: Webcast recording

### Machine learning overview

- What is machine learning, and how does it work?: Video covering supervised and unsupervised learning, with a Jupyter notebook
- Getting started in scikit-learn with the famous iris dataset: Video covering machine learning terminology, with a Jupyter notebook
- In-depth introduction to machine learning in 15 hours of expert videos: Videos from Stanford University professors, with slides, R code, and a free PDF book
- Comparing supervised learning algorithms: Comparison chart of eight common models, which is also available as a list of advantages and disadvantages

### Machine learning models in Python

- K-nearest neighbors (KNN): Video using scikit-learn, with a Jupyter notebook
- Linear regression: Video using scikit-learn, pandas, and seaborn, with a Jupyter notebook
- A friendly introduction to linear regression: Lesson using Statsmodels and scikit-learn, with a Jupyter notebook
- Example of logistic regression in Python using scikit-learn: Example classification problem, with a Jupyter notebook
- Guide to an in-depth understanding of logistic regression: Lesson using scikit-learn, with a Jupyter notebook
- Decision trees for regression and classification: Lesson notebook using scikit-learn
- Ensembling, bagging, and Random Forests: Lesson notebook using scikit-learn
- Regularized regression and classification: Lesson notebook using scikit-learn
- K-means and DBSCAN clustering: Lesson notebook using scikit-learn

### Machine learning model evaluation

- Comparing machine learning models: Video covering train/test split using scikit-learn, with a Jupyter notebook
- Cross-validation: Video covering parameter tuning, model selection, and feature selection using scikit-learn, with a Jupyter notebook
- Efficiently searching for optimal tuning parameters: Video covering grid search and randomized search using scikit-learn, with a Jupyter notebook
- Evaluating a classification model: Video covering classification metrics, confusion matrix, and ROC/AUC using scikit-learn, with a Jupyter notebook
- ROC curves and Area Under the Curve explained: Animated video with transcript
- Simple guide to confusion matrix terminology: Reference guide to classification metrics
- Comparing model evaluation procedures and metrics: Reference guide to three procedures and seven metrics

### Other machine learning topics

- Quick reference guide to applying and interpreting linear regression: Step-by-step guide
- Exploring the bias-variance tradeoff: Lesson notebook using Seaborn
- Feature scaling/standardization: Lesson notebook using scikit-learn
- Dummy variables: Video using pandas, with a Jupyter notebook
- Handling missing values: Video using pandas, with a Jupyter notebook

### Machine learning applications

- Detecting Fraudulent Skype Users via Machine Learning: Presentation with slides
- My first Kaggle competition: Allstate Purchase Prediction Challenge: Presentation with slides, video, and R code

### R: Machine learning models

- Example of linear regression and regularization in R: Example using lasso and ridge regression, with an R Markdown document

### R: dplyr package (data analysis, manipulation)

- Hands-on dplyr tutorial for faster data manipulation in R: Video (beginner level), with an R Markdown document
- Going deeper with dplyr: New features in 0.3 and 0.4: Video (beginner/intermediate level), with an R Markdown document

### Git and GitHub (version control)

- Git and GitHub videos for beginners: 11 videos (beginner level)
- Git quick reference for beginners: Reference guide to common commands
- Simple guide to forks in GitHub and Git: Visual guide to forking, syncing, and pull requests
- GitHub is "just" Dropbox for Git: High-level explanation of Git and GitHub

### Jupyter notebook (coding environment)

- Setting up Python for machine learning: Video covering basic usage of the Jupyter notebook (formerly known as the IPython notebook)
- Reproducibility is not just for researchers: Explains why the Jupyter notebook is a critical tool for data analysis