How to Combine Oversampling and Undersampling for Imbalanced Classification
Last Updated on August 21, 2020
Resampling methods are designed to add or remove examples from the training dataset in order to change the class distribution.
Once the class distributions are more balanced, the suite of standard machine learning classification algorithms can be fit successfully on the transformed datasets.
Oversampling methods duplicate or create new synthetic examples in the minority class, whereas undersampling methods delete or merge examples in the majority class. Both types of resampling can be effective when used in isolation, although can be more effective when both types of methods are used together.
In this tutorial, you will discover how to combine oversampling and undersampling techniques for imbalanced classification.
After completing this tutorial, you will know:
- How to define a sequence of oversampling and undersampling methods to be applied to a training dataset or when evaluating a classifier model.
- How to manually combine oversampling and undersampling methods for imbalanced classification.
- How to use pre-defined and well-performing combinations of resampling methods for imbalanced classification.
Kick-start your project with my new book Imbalanced Classification with Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.