# Example of logistic regression in Python using scikit-learn

Back in April, I provided a worked example of a real-world linear regression problem using R. These types of examples can be useful for students getting started in machine learning because they demonstrate both the machine learning workflow and the detailed commands used to execute that workflow.

## My logistic regression example

This time around, I wanted to provide a machine learning example in Python using the ever-popular scikit-learn module. For my Data Science class, I worked through a classification problem using logistic regression and posted my results online in an IPython Notebook. Here are the steps demonstrated in this example:

- loading a dataset from
`statsmodels`

into a`pandas`

DataFrame - exploring the data using
`pandas`

- visualizing the data using
`matplotlib`

- preparing the data for logistic regression using
`patsy`

- building a logistic regression model using
`scikit-learn`

- model evaluation using cross-validation from
`scikit-learn`

After viewing the notebook online, you can easily download the notebook and re-run this code on your own computer, especially because the dataset I used is built into statsmodels.

## Related resources

- Guide to an in-depth understanding of logistic regression
- IPython Notebook introducing linear regression in Python
- 4-hour video series on machine learning in Python

## Publishing your own IPython Notebook

Much like R Markdown documents, IPython Notebooks are a great way to weave together your code, output, and explanation into a single document that can be shared with others via the IPython Notebook Viewer. And unlike R Markdown documents, IPython Notebooks are fully interactive once download by a user. Making a Notebook accessible via the Notebook Viewer is as simple as posting your .ipynb file to a publicly accessible URL (such as a GitHub repo or a Gist), and pasting the link to that file on the Notebook Viewer homepage.

If you're just getting started in Python, I highly recommend downloading the Anaconda distribution of Python 2.7 since it already contains all of the most popular Python modules for data analysis and scientific computing.

P.S. Want to receive more content like this in your inbox? Subscribe to the Data School newsletter.

Simple example of Logistic Regression in an IPython Notebook using scikit-learn, pandas, matplotlib: http://t.co/SSBOlnZIhP #machinelearning

— Kevin Markham (@justmarkham) December 9, 2014