Tour of Data Preparation Techniques for Machine Learning

Last Updated on June 30, 2020

Predictive modeling machine learning projects, such as classification and regression, always involve some form of data preparation.

The specific data preparation required for a dataset depends on the specifics of the data, such as the variable types, as well as the algorithms that will be used to model them that may impose expectations or requirements on the data.

Nevertheless, there is a collection of standard data preparation algorithms that can be applied to structured data (e.g. data that forms a large table like in a spreadsheet). These data preparation algorithms can be organized or grouped by type into a framework that can be helpful when comparing and selecting techniques for a specific project.

In this tutorial, you will discover the common data preparation tasks performed in a predictive modeling machine learning task.

After completing this tutorial, you will know:

  • Techniques such as data cleaning can identify and fix errors in data like missing values.
  • Data transforms can change the scale, type, and probability distribution of variables in the dataset.
  • Techniques such as feature selection and dimensionality reduction can reduce the number of input variables.

Kick-start your project with my new book To finish reading, please visit source site