A Simple Intuition for Overfitting, or Why Testing on Training Data is a Bad Idea

Last Updated on August 21, 2016 When you first start out with machine learning you load a dataset and try models. You might think to yourself, why can’t I just build a model with all of the data and evaluate it on the same dataset? It seems reasonable. More data to train the model is better, right? Evaluating the model and reporting results on the same dataset will tell you how good the model is, right? Wrong. In this post […]

Read more

Template for Working through Machine Learning Problems in Weka

Last Updated on August 22, 2019 When you are getting started in Weka, you may feel overwhelmed. There are so many datasets, so many filters and so many algorithms to choose from. There is too much choice. There are too many things you could be doing. Too much ChoicePhoto by emilio labrador, some rights reserved. Structured process is key. I have talked about process and the need for tasks like spot checking algorithms to overcome the overwhelm and start learning […]

Read more

Biggest Mistake I Made When Starting Machine Learning, And How To Avoid It

Last Updated on August 22, 2019 When I first got started in machine learning I implemented algorithms by hand. It was really slow going. I was a terrible programmer at the time. I was trying to figure out the algorithms from books, how to use them on problems and how to write code – all at the same time. This was the biggest mistake I made when getting started. It made everything 3-times harder and killed my motivation. A friend […]

Read more

Feature Selection to Improve Accuracy and Decrease Training Time

Last Updated on August 16, 2020 Working on a problem, you are always looking to get the most out of the data that you have available. You want the best accuracy you can get. Typically, the biggest wins are in better understanding the problem you are solving. This is why I stress you spend so much time up front defining your problem, analyzing the data, and preparing datasets for your models. A key part of data preparation is creating transforms […]

Read more

Project Spotlight: Stack Exchange Clustering using Mahout with Konstantin Slisenko

Last Updated on August 16, 2020 This is a project spotlight with Konstantin Slisenko a programmer and machine learning enthusiast. Could you please introduce yourself? My name is Konstantin Slisenko, I’m from Belarus. I graduated from the Belarusian State University of Informatics and Radioelectronics. I am currently taking a master course. Konstantin Slisenko I’m a Java developer and work in JazzTeam company. I like to learn new technologies. I’m currently interested in big data and machine learning. I like to participate […]

Read more

Market Basket Analysis with Association Rule Learning

Last Updated on August 22, 2019 The promise of Data Mining was that algorithms would crunch data and find interesting patterns that you could exploit in your business. The exemplar of this promise is market basket analysis (Wikipedia calls it affinity analysis). Given a pile of transactional records, discover interesting purchasing patterns that could be exploited in the store, such as offers and product layout. In this post you will work through a market basket analysis tutorial using association rule learning […]

Read more

Project Spotlight: Event Recommendation in Python with Artem Yankov

Last Updated on June 7, 2016 This is a project spotlight with Artem Yankov. Could you please introduce yourself? My name is Artem Yankov, I have worked as a software engineer for Badgeville for the last 3 years. I’m using there Ruby and Scala although my prior background includes use of various languages such as: Assembly, C/C++, Python, Clojure and JS. I love hacking on small projects and exploring different fields, for instance two almost random fields I’ve looked at were […]

Read more

Classification Accuracy is Not Enough: More Performance Measures You Can Use

Last Updated on June 20, 2019 When you build a model for a classification problem you almost always want to look at the accuracy of that model as the number of correct predictions from all predictions made. This is the classification accuracy. In a previous post, we have looked at evaluating the robustness of a model for making predictions on unseen data using cross-validation and multiple cross-validation where we used classification accuracy and average classification accuracy. Once you have a […]

Read more

Machine Learning Tips from a World Class Practitioner: Phil Brierley

Last Updated on June 7, 2016 Phil Brierley won the Heritage Health Prize Kaggle machine learning competition. Phil was trained as a mechanical engineer and has a background in data mining with his company Tiberius Data Mining. He is heavily into R these days and keeps a blog at Another Data Mining Blog. In October 2013 he presented to the Melbourne Users of R special interest group. The title of his talk was “Techniques to improve the accuracy of your Predictive Models” and you can […]

Read more

5 Steps to Thinking Like a Designer in Machine Learning

Last Updated on June 7, 2016 This is a guest post by Kevin Dalias. I recently had the chance to attend Strata 2014 in Santa Clara, and since it was my first time at the conference, I tried to attend as many sessions as I could to understand what really makes data science tick these days. And of course, I heard plenty of the usual “a data scientist must be…” bullet points, but session after session, a new addition to the […]

Read more
1 774 775 776 777 778 919