May 22, 2014 · machine learning presentation

My first Kaggle competition: Allstate Purchase Prediction Challenge

For my data science class project with General Assembly, I competed in Kaggle's Allstate Purchase Prediction Challenge. Kaggle is a website that hosts machine learning competitions, bringing together some of the brightest minds in the field to solve predictive problems. This was my first Kaggle competition, and I was excited to flex my machine learning muscles!

It was an extremely challenging problem, requiring competitors to predict the exact car insurance options that a given customer would buy. Data provided to competitors about each customer included a partial history of the insurance quotes they reviewed, demographic data, and limited information about the car they were seeking to insure.

I wrote a paper about the competition, detailing my process for exploring and visualizing the data, engineering new features, and building machine learning models. At the end of the competition, I was ranked in the top 20% of the public leaderboard, but unfortunately fell significantly in the rankings once the private leaderboard was revealed. I had failed to cross-validate my selection of which entries to submit because I had false confidence in the public leaderboard, a common error in Kaggle competitions.

After the competition closed, numerous top competitors shared their strategies in the Kaggle forums. I was pleased to see that at least some of my strategies were on the right track, namely the stacking of models, optimizing for precision instead of accuracy, and seeking out unlikely option combinations.

I presented my project to the class on May 19, just hours before the competition closed, and also presented at DC Hack and Tell on May 20. My presentation slides are available on Speaker Deck, and my paper and code are available on GitHub.

A recorded version of my presentation is posted on YouTube (16 minutes) and also embedded below. As always, I welcome your feedback and hope you enjoy the presentation!

P.S. Want to get better at machine learning? I teach an online course.

Comments powered by Disqus