Develop an Intuition for Severely Skewed Class Distributions
Last Updated on January 14, 2020
An imbalanced classification problem is a problem that involves predicting a class label where the distribution of class labels in the training dataset is not equal.
A challenge for beginners working with imbalanced classification problems is what a specific skewed class distribution means. For example, what is the difference and implication for a 1:10 vs. a 1:100 class ratio?
Differences in the class distribution for an imbalanced classification problem will influence the choice of data preparation and modeling algorithms. Therefore it is critical that practitioners develop an intuition for the implications for different class distributions.
In this tutorial, you will discover how to develop a practical intuition for imbalanced and highly skewed class distributions.
After completing this tutorial, you will know:
- How to create a synthetic dataset for binary classification and plot the examples by class.
- How to create synthetic classification datasets with any given class distribution.
- How different skewed class distributions actually look in practice.
Kick-start your project with my new book Imbalanced Classification with Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
- Update Jan/2020: Updated for changes in scikit-learn v0.22 API.