October 22, 2014 · machine learning

Quick reference guide to applying and interpreting linear regression

After learning a complex topic, I find it helpful to create a "quick reference guide" for myself, so that I can easily review the key points of that topic before applying it to a data problem or teaching it to others. When that topic is conceptual (such as linear regression), those guides tend to resemble the notes you might take from a classroom lecture. When that topic is code-based, those guides tend to contain examples and annotated lists of commands, like my quick reference guide to Git, my dplyr tutorial, or my Python reference guide.

I created this guide to linear regression a while ago, after reading Hastie and Tibshirani's excellent An Introduction to Statistical Learning (with Applications in R). Now that I'm a Data Science instructor for General Assembly, I've made a personal commitment to sharing these guides so that my students and others can benefit from them.

Please note that this is not a tutorial, and is not suitable for teaching you linear regression if you are not already familiar with it. Instead, it is only intended to be a light reference guide to applying linear regression and interpreting the output, and ignores many nuances of the topic. However, I have listed resources for deepening your understanding (and applying it to R, Python, and other statistical packages) at the bottom of this post.

Your feedback and clarifications are welcome!

Simple Linear Regression:

Computing coefficients

How accurate is B1?

Is B1 non-zero?

How well does the model fit the data?

Multiple Linear Regression:

Computing coefficients

Is at least one coefficient non-zero?

How well does the model fit the data?

Qualitative (aka "categorical") predictors

Interactions between variables

Problems with Linear Regression:

Non-linear relationships

Non-constant variance of residuals (aka "heteroskedasticity")

Outliers

High leverage points

Collinearity

Resources:

Comments powered by Disqus