10 Standard Datasets for Practicing Applied Machine Learning

Last Updated on May 20, 2020

The key to getting good at applied machine learning is practicing on lots of different datasets.

This is because each problem is different, requiring subtly different data preparation and modeling methods.

In this post, you will discover 10 top standard machine learning datasets that you can use for practice.

Let’s dive in.

  • Update Mar/2018: Added alternate link to download the Pima Indians and Boston Housing datasets as the originals appear to have been taken down.
  • Update Feb/2019: Minor update to the expected default RMSE for the insurance dataset.

Overview

A structured Approach

Each dataset is summarized in a consistent way. This makes them easy to compare and navigate for you to practice a specific data preparation technique or modeling method.

The aspects that you need to know about each dataset are:

  1. Name: How to refer to the dataset.
  2. Problem Type: Whether the problem is regression or classification.
  3. Inputs and Outputs: The numbers and known names of input and output features.
  4. Performance: Baseline performance for comparison using the Zero Rule algorithm, as well as best known performance (if known).
  5. Sample: A snapshot of the first 5 rows of raw data.
  6. Links: Where you
    To finish reading, please visit source site