Common Pitfalls In Machine Learning Projects

Last Updated on June 7, 2016

In a recent presentation, Ben Hamner described the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle.

The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata.

In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them.

Machine Learning Process

Early in the talk, Ben presented a snap-shot of the process for working a machine learning problem end-to-end.

Machine Learning Process

Machine Learning Process
Taken from “Machine Learning Gremlins” by Ben Hamner

This snapshot included 9 steps, as follows:

  1. Start with a business problem
  2. Source data
  3. Split data
  4. Select an evaluation metric
  5. Perform feature extraction
  6. Model Training
  7. Feature Selection
  8. Model Selection
  9. Production System

He commented that the process is iterative rather than linear.

He also commented that each step in this process can go wrong, derailing the whole project.

Discriminating Dogs and Cats

Ben presented a case study problem for building an automatic cat
To finish reading, please visit source site