Detecting Fraudulent Skype Users via Machine Learning
As part of my Data Science class with General Assembly, we each gave a presentation about a real-world application of data science. My talk was about using machine learning to detect fraud on Skype, and was based upon an excellent paper by Microsoft Research published in November 2013.
Although Skype already had measures in place to detect fraud (e.g., credit card fraud, spam instant messages), the research team's goal was to improve the detection of "stealthy fraudulent users" that evade Skype's defenses for a prolonged period. They built a machine learning classifier that flagged potentially fraudulent users, and was able to detect 68% of these users with a false positive rate of 5%. The novelty in their approach was the fusing of disparate data types (profile information, Skype product usage, and Skype social activity) into a single classifier.
My presentation slides are embedded below, and are also available on Speaker Deck.
If you enjoyed the slides and would like more details, Microsoft's research paper provides a great introduction to the modern machine learning workflow and does not require a statistics background to understand. Alternatively, you can read a less technical article summarizing the paper.
If you would like to read about other applications of machine learning, here are a few of my favorite articles:
- How a Math Genius Hacked OkCupid to Find True Love (WIRED)
- Essay-Grading Software Offers Professors a Break (New York Times)
- 23andMe's Newest Feature Explores Your Ancestry (23andMe Blog)
- Netflix Prize (Wikipedia)
- A $1 Million Research Bargain for Netflix, and Maybe a Model for Others (New York Times)
For a longer list of articles and research papers like these, check out the GitHub repository for my Data Science class!
Detecting Fraudulent Skype Users via #MachineLearning: https://t.co/u8BHwkksmR Useful intro to #DataScience workflow pic.twitter.com/IR3wMqCe8N— Kevin Markham (@justmarkham) July 9, 2016