Iterative Imputation for Missing Values in Machine Learning

Last Updated on August 18, 2020

Datasets may have missing values, and this can cause problems for many machine learning algorithms.

As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. This is called missing data imputation, or imputing for short.

A sophisticated approach involves defining a model to predict each missing feature as a function of all other features and to repeat this process of estimating feature values multiple times. The repetition allows the refined estimated values for other features to be used as input in subsequent iterations of predicting missing values. This is generally referred to as iterative imputation.

In this tutorial, you will discover how to use iterative imputation strategies for missing data in machine learning.

After completing this tutorial, you will know:

Missing values must be marked with NaN values and can be replaced with iteratively estimated values.
How to load a CSV value with missing values and mark the missing values with NaN values and report the number and percentage of missing values for each column.
How to impute missing values with iterative models as a data preparation
To finish reading, please visit source site

Data Preparation