10 Standard Datasets for Practicing Applied Machine Learning

Last Updated on May 20, 2020 The key to getting good at applied machine learning is practicing on lots of different datasets. This is because each problem is different, requiring subtly different data preparation and modeling methods. In this post, you will discover 10 top standard machine learning datasets that you can use for practice. Let’s dive in. Update Mar/2018: Added alternate link to download the Pima Indians and Boston Housing datasets as the originals appear to have been taken […]

Read more

How to Get Started with Kaggle

Last Updated on March 11, 2017 4-Step Process for Getting Started and Getting Good atCompetitive Machine Learning. Kaggle is a community and site for hosting machine learning competitions. Competitive machine learning can be a great way to develop and practice your skills, as well as demonstrate your capabilities. In this post, you will discover a simple 4-step process to get started and get good at competitive machine learning on Kaggle. Let’s get started. How to Get Started with KagglePhoto by […]

Read more

How to Train a Final Machine Learning Model

The machine learning model that we use to make predictions on new data is called the final model. There can be confusion in applied machine learning about how to train a final model. This error is seen with beginners to the field who ask questions such as: How do I predict with cross validation? Which model do I choose from cross-validation? Do I use the model after preparing it on the training dataset? This post will clear up the confusion. […]

Read more

7 Ways to Handle Large Data Files for Machine Learning

Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common. This leads to questions like: How do I load my multiple gigabyte data file? Algorithms crash when I try to run my dataset; what should I do? Can you help me with out-of-memory errors? In this post, I want to offer some common suggestions you may want to consider. 7 Ways to Handle Large Data Files for Machine LearningPhoto by Gareth […]

Read more

What is the Difference Between Test and Validation Datasets?

Last Updated on August 14, 2020 A validation dataset is a sample of data held back from training your model that is used to give an estimate of model skill while tuning model’s hyperparameters. The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models. There is much […]

Read more

How Much Training Data is Required for Machine Learning?

Last Updated on May 23, 2019 The amount of data you need depends both on the complexity of your problem and on the complexity of your chosen algorithm. This is a fact, but does not help you if you are at the pointy end of a machine learning project. A common question I get asked is: How much data do I need? I cannot answer this question directly for you, or for anyone. But I can give you a handful […]

Read more

What is the Difference Between a Parameter and a Hyperparameter?

Last Updated on June 17, 2019 It can be confusing when you get started in applied machine learning. There are so many terms to use and many of the terms may not be used consistently. This is especially true if you have come from another field of study that may use some of the same terms as machine learning, but they are used differently. For example: the terms “model parameter” and “model hyperparameter.” Not having a clear definition for these […]

Read more

How to Plan and Run Machine Learning Experiments Systematically

Machine learning experiments can take a long time. Hours, days, and even weeks in some cases. This gives you a lot of time to think and plan for additional experiments to perform. In addition, the average applied machine learning project may require tens to hundreds of discrete experiments in order to find a data preparation model and model configuration that gives good or great performance. The drawn-out nature of the experiments means that you need to carefully plan and manage […]

Read more

Why Applied Machine Learning Is Hard

How to Handle the Intractability of Applied Machine Learning. Applied machine learning is challenging. You must make many decisions where there is no known “right answer” for your specific problem, such as: What framing of the problem to use? What input and output data to use? What learning algorithm to use? What algorithm configuration to use? This is challenging for beginners that expect that you can calculate or be told what data to use or how to best configure an […]

Read more

So, You are Working on a Machine Learning Problem…

Last Updated on January 9, 2019 So, you’re working on a machine learning problem. I want to really nail down where you’re at right now. Let me make some guesses… So, You are Working on a Machine Learning Problem…Photo by David Mulder, some rights reserved. 1) You Have a Problem So you have a problem that you need to solve. Maybe it’s your problem, an idea you have, a question, or something you want to address. Or maybe it is […]

Read more
1 3 4 5 6