How to Choose Data Preparation Methods for Machine Learning

Last Updated on July 15, 2020

Data preparation is an important part of a predictive modeling project.

Correct application of data preparation will transform raw data into a representation that allows learning algorithms to get the most out of the data and make skillful predictions. The problem is choosing a transform or sequence of transforms that results in a useful representation is very challenging. So much so that it may be considered more of an art than a science.

In this tutorial, you will discover strategies that you can use to select data preparation techniques for your predictive modeling datasets.

After completing this tutorial, you will know:

  • Data preparation techniques can be chosen based on detailed knowledge of the dataset and algorithm and this is the most common approach.
  • Data preparation techniques can be grid searched as just another hyperparameter in the modeling pipeline.
  • Data transforms can be applied to a training dataset in parallel to create many extracted features on which feature selection can be applied and a model trained.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get
To finish reading, please visit source site