Assessing and Comparing Classifier Performance with ROC Curves

Last Updated on March 5, 2020 The most commonly reported measure of classifier performance is accuracy: the percent of correct classifications obtained. This metric has the advantage of being easy to understand and makes comparison of the performance of different classifiers trivial, but it ignores many of the factors which should be taken into account when honestly assessing the performance of a classifier. What Is Meant By Classifier Performance? Classifier performance is more than just a count of correct classifications. […]

Read more

Lessons Learned from Building Machine Learning Systems

Last Updated on September 5, 2016 In a recent presentation at MLConf, Xavier Amatriain described 10 lessons that he has learned about building machine learning systems as the Research/Engineering Manager at Netflix. In this you will discover these 10 lessons in a summary from his talk and slides. Lessons Learned from Building Machine Learning Systems Taken from Xavier’s presentation 10 Lessons Learned The 10 lessons that Xavier presents can be summarized as follows: More data vs./and Better Models You might […]

Read more

How To Work Through A Problem Like A Data Scientist

Last Updated on August 15, 2020 In a 2010 post Hilary Mason and Chris Wiggins described the OSEMN process as a taxonomy of tasks that a data scientist should feel comfortable working on. The title of the post was “A Taxonomy of Data Science” on the now defunct dataists blog. This process has also been used as the structure of a recent book, specifically “Data Science at the Command Line: Facing the Future with Time-Tested Tools” by Jeroen Janssens published […]

Read more

Common Pitfalls In Machine Learning Projects

Last Updated on June 7, 2016 In a recent presentation, Ben Hamner described the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle. The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata. In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them. Machine Learning Process Early in the talk, Ben presented a snap-shot of the process for working […]

Read more

What To Do During Machine Learning Model Runs

Last Updated on June 7, 2016 There was a recent question that asked “How to not waste-time/procrastinate while ml scripts are running?“. I think this is an important question. I think answers to this question show a level of organization or maturity in your approach to work. I left a small comment on this question, but in this post I elaborate on my answer and give you a few perspectives on how to consider this question, minimize it and even […]

Read more

Choosing Machine Learning Algorithms: Lessons from Microsoft Azure

Last Updated on August 12, 2019 Microsoft recently launched support for machine learning in their Azure cloud computing platform. Buried in some of their technical documentation for the platform are some resources that you may find useful for thinking about what machine learning algorithm to use in different situations. In this post we take a look at the Microsoft recommendations for machine learning algorithms and the lessons that we can use when working through machine learning problems on any platform. […]

Read more

How to Use a Machine Learning Checklist to Get Accurate Predictions, Reliably

Last Updated on August 15, 2020 How do you get accurate results using machine learning on problem after problem? The difficulty is that each problem is unique, requiring different data sources, features, algorithms, algorithm configurations and on and on. The solution is to use a checklist that guarantees a good result every time. In this post you will discover a checklist that you can use to reliably get good results on your machine learning problems. Machine Learning ChecklistPhoto by Crispy, […]

Read more

Simple 3-Step Methodology To The Best Machine Learning Algorithm

Last Updated on August 15, 2020 How do you choose the best algorithm for your dataset? Machine learning is a problem of induction where general rules are learned from specific observed data from the domain. It infeasible (impossible?) to know what representation or what algorithm to use to best learn from the data on a specific problem before hand, without knowing the problem so well that you probably don’t need machine learning to begin with. So what algorithm should you use […]

Read more

Deploy Your Predictive Model To Production

Last Updated on September 30, 2016 5 Best Practices For Operationalizing Machine Learning. Not all predictive models are at Google-scale. Sometimes you develop a small predictive model that you want to put in your software. I recently received this reader question: Actually, there is a part that is missing in my knowledge about machine learning. All tutorials give you the steps up until you build your machine learning model. How could you use this model? In this post, we look at […]

Read more

Machine Learning Performance Improvement Cheat Sheet

Last Updated on May 22, 2019 32 Tips, Tricks and Hacks That You Can Use To Make Better Predictions. The most valuable part of machine learning is predictive modeling. This is the development of models that are trained on historical data and make predictions on new data. And the number one question when it comes to predictive modeling is: How can I get better results? This cheat sheet contains my best advice distilled from years of my own application and […]

Read more
1 2 3 4 5 6