How to Use Discretization Transforms for Machine Learning

Last Updated on August 28, 2020

Numerical input variables may have a highly skewed or non-standard distribution.

This could be caused by outliers in the data, multi-modal distributions, highly exponential distributions, and more.

Many machine learning algorithms prefer or perform better when numerical input variables have a standard probability distribution.

The discretization transform provides an automatic way to change a numeric input variable to have a different data distribution, which in turn can be used as input to a predictive model.

In this tutorial, you will discover how to use discretization transforms to map numerical values to discrete categories for machine learning

After completing this tutorial, you will know:

Many machine learning algorithms prefer or perform better when numerical with non-standard probability distributions are made discrete.
Discretization transforms are a technique for transforming numerical input or output variables to have discrete ordinal labels.
How to use the KBinsDiscretizer to change the structure and distribution of numeric variables to improve the performance of predictive models.

Kick-start your project with my new book Data Preparation for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

To finish reading, please visit source site

Data Preparation