Feature Selection to Improve Accuracy and Decrease Training Time

Last Updated on August 16, 2020 Working on a problem, you are always looking to get the most out of the data that you have available. You want the best accuracy you can get. Typically, the biggest wins are in better understanding the problem you are solving. This is why I stress you spend so much time up front defining your problem, analyzing the data, and preparing datasets for your models. A key part of data preparation is creating transforms […]

Read more

Project Spotlight: Stack Exchange Clustering using Mahout with Konstantin Slisenko

Last Updated on August 16, 2020 This is a project spotlight with Konstantin Slisenko a programmer and machine learning enthusiast. Could you please introduce yourself? My name is Konstantin Slisenko, I’m from Belarus. I graduated from the Belarusian State University of Informatics and Radioelectronics. I am currently taking a master course. Konstantin Slisenko I’m a Java developer and work in JazzTeam company. I like to learn new technologies. I’m currently interested in big data and machine learning. I like to participate […]

Read more

Market Basket Analysis with Association Rule Learning

Last Updated on August 22, 2019 The promise of Data Mining was that algorithms would crunch data and find interesting patterns that you could exploit in your business. The exemplar of this promise is market basket analysis (Wikipedia calls it affinity analysis). Given a pile of transactional records, discover interesting purchasing patterns that could be exploited in the store, such as offers and product layout. In this post you will work through a market basket analysis tutorial using association rule learning […]

Read more

Project Spotlight: Event Recommendation in Python with Artem Yankov

Last Updated on June 7, 2016 This is a project spotlight with Artem Yankov. Could you please introduce yourself? My name is Artem Yankov, I have worked as a software engineer for Badgeville for the last 3 years. I’m using there Ruby and Scala although my prior background includes use of various languages such as: Assembly, C/C++, Python, Clojure and JS. I love hacking on small projects and exploring different fields, for instance two almost random fields I’ve looked at were […]

Read more

Classification Accuracy is Not Enough: More Performance Measures You Can Use

Last Updated on June 20, 2019 When you build a model for a classification problem you almost always want to look at the accuracy of that model as the number of correct predictions from all predictions made. This is the classification accuracy. In a previous post, we have looked at evaluating the robustness of a model for making predictions on unseen data using cross-validation and multiple cross-validation where we used classification accuracy and average classification accuracy. Once you have a […]

Read more

Machine Learning Tips from a World Class Practitioner: Phil Brierley

Last Updated on June 7, 2016 Phil Brierley won the Heritage Health Prize Kaggle machine learning competition. Phil was trained as a mechanical engineer and has a background in data mining with his company Tiberius Data Mining. He is heavily into R these days and keeps a blog at Another Data Mining Blog. In October 2013 he presented to the Melbourne Users of R special interest group. The title of his talk was “Techniques to improve the accuracy of your Predictive Models” and you can […]

Read more

5 Steps to Thinking Like a Designer in Machine Learning

Last Updated on June 7, 2016 This is a guest post by Kevin Dalias. I recently had the chance to attend Strata 2014 in Santa Clara, and since it was my first time at the conference, I tried to attend as many sessions as I could to understand what really makes data science tick these days. And of course, I heard plenty of the usual “a data scientist must be…” bullet points, but session after session, a new addition to the […]

Read more

Case Study: Predicting the Onset of Diabetes Within Five Years (part 1 of 3)

Last Updated on August 22, 2019 This is a guest post by Igor Shvartser, a clever young student I have been coaching. This post is part 1 in a 3 part series on modeling the famous Pima Indians Diabetes dataset that will introduce the problem and the data. Part 2 will investigate feature selection and spot checking algorithms and Part 3 in the series will investigate improvements to the classification accuracy and final presentation of results. Kick-start your project with my […]

Read more

BigML Review: Discover the Clever Features in This Machine Learning as a Service Platform

Last Updated on August 16, 2020 Machine Learning has been commoditized into a service. This is a recent trend that looks like it will develop into the mainstream like commoditized storage and virtualization. It is the natural next step. In this review you will learn about BigML that provides commoditized machine learning as a service for business analysts and application integration. About BigML BigML was co-founded by a group of five guys in 2011. Francisco Martin seems to be active […]

Read more

Project Spotlight: Face Recognition with Shashank Singh

Last Updated on June 18, 2019 This is a project spotlight with Shashank Singh a programmer and machine learning enthusiast. Could you please introduce yourself? I did Bachelors of Technology in Computer Science. I co-founded a startup at 23, spectacularly crashed it by 26th birthday. After that I was feeling particularly low and pretty dry of inspiration for quite some time. Shashank Singh I moved to Mumbai, India to joined Idyllic Software and I came in contact with amazing people with such […]

Read more
1 766 767 768 769 770 911