Undersampling Algorithms for Imbalanced Classification

Last Updated on January 20, 2020 Resampling methods are designed to change the composition of a training dataset for an imbalanced classification task. Most of the attention of resampling methods for imbalanced classification is put on oversampling the minority class. Nevertheless, a suite of techniques has been developed for undersampling the majority class that can be used in conjunction with effective oversampling methods. There are many different types of undersampling techniques, although most can be grouped into those that select […]

Read more

How to Combine Oversampling and Undersampling for Imbalanced Classification

Last Updated on August 21, 2020 Resampling methods are designed to add or remove examples from the training dataset in order to change the class distribution. Once the class distributions are more balanced, the suite of standard machine learning classification algorithms can be fit successfully on the transformed datasets. Oversampling methods duplicate or create new synthetic examples in the minority class, whereas undersampling methods delete or merge examples in the majority class. Both types of resampling can be effective when […]

Read more

Tour of Data Sampling Methods for Imbalanced Classification

Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples are not important and can be ignored in order to achieve good performance. Data sampling provides a collection of techniques that transform a training dataset […]

Read more

Cost-Sensitive Logistic Regression for Imbalanced Classification

Last Updated on August 28, 2020 Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The weighting can penalize the model less for errors made on examples from the majority class and penalize the model […]

Read more

Cost-Sensitive Decision Trees for Imbalanced Classification

Last Updated on August 21, 2020 The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. This problem can be […]

Read more

Cost-Sensitive SVM for Imbalanced Classification

Last Updated on August 21, 2020 The Support Vector Machine algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The SVM algorithm finds a hyperplane decision boundary that best splits the examples into two classes. The split is made soft through the use of a margin that allows some points to be misclassified. By default, this margin favors the majority class on imbalanced datasets, although it can be updated to take the importance of […]

Read more

How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification

Last Updated on August 21, 2020 Deep learning neural networks are a flexible class of machine learning algorithms that perform well on a wide range of problems. Neural networks are trained using the backpropagation of error algorithm that involves calculating errors made by the model on the training dataset and updating the model weights in proportion to those errors. The limitation of this method of training is that examples from each class are treated the same, which for imbalanced datasets […]

Read more

How to Configure XGBoost for Imbalanced Classification

Last Updated on August 21, 2020 The XGBoost algorithm is effective for a wide range of regression and classification predictive modeling problems. It is an efficient implementation of the stochastic gradient boosting algorithm and offers a range of hyperparameters that give fine-grained control over the model training procedure. Although the algorithm performs well in general, even on imbalanced classification datasets, it offers a way to tune the training algorithm to pay more attention to misclassification of the minority class for […]

Read more

Cost-Sensitive Learning for Imbalanced Classification

Most machine learning algorithms assume that all misclassification errors made by a model are equal. This is often not the case for imbalanced classification problems where missing a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class. There are many real-world examples, such as detecting spam email, diagnosing a medical condition, or identifying fraud. In all of these cases, a false negative (missing a case) is worse or more costly than […]

Read more

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

Last Updated on August 28, 2020 Classification predictive modeling typically involves predicting a class label. Nevertheless, many machine learning algorithms are capable of predicting a probability or scoring of class membership, and this must be interpreted before it can be mapped to a crisp class label. This is achieved by using a threshold, such as 0.5, where all values equal or greater than the threshold are mapped to one class and all other values are mapped to another class. For […]

Read more
1 848 849 850 851 852 910