Articles About Machine Learning

Imbalanced Classification With Python (7-Day Mini-Course)

Last Updated on August 18, 2020 Imbalanced Classification Crash Course.Get on top of imbalanced classification in 7 days. Classification predictive modeling is the task of assigning a label to an example. Imbalanced classification are those classification tasks where the distribution of examples across the classes is not equal. Practical imbalanced classification requires the use of a suite of specialized techniques, data preparation techniques, learning algorithms, and performance metrics. In this crash course, you will discover how you can get started […]

September 29, 2020 Machine Learning

SMOTE for Imbalanced Classification with Python

Last Updated on August 21, 2020 Imbalanced classification involves developing predictive models on classification datasets that have a severe class imbalance. The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important. One approach to addressing imbalanced datasets is to oversample the minority class. The simplest approach involves duplicating examples in the minority […]

September 29, 2020 Machine Learning

Undersampling Algorithms for Imbalanced Classification

Last Updated on January 20, 2020 Resampling methods are designed to change the composition of a training dataset for an imbalanced classification task. Most of the attention of resampling methods for imbalanced classification is put on oversampling the minority class. Nevertheless, a suite of techniques has been developed for undersampling the majority class that can be used in conjunction with effective oversampling methods. There are many different types of undersampling techniques, although most can be grouped into those that select […]

September 29, 2020 Machine Learning

How to Combine Oversampling and Undersampling for Imbalanced Classification

Last Updated on August 21, 2020 Resampling methods are designed to add or remove examples from the training dataset in order to change the class distribution. Once the class distributions are more balanced, the suite of standard machine learning classification algorithms can be fit successfully on the transformed datasets. Oversampling methods duplicate or create new synthetic examples in the minority class, whereas undersampling methods delete or merge examples in the majority class. Both types of resampling can be effective when […]

September 29, 2020 Machine Learning

Tour of Data Sampling Methods for Imbalanced Classification

Machine learning techniques often fail or give misleadingly optimistic performance on classification datasets with an imbalanced class distribution. The reason is that many machine learning algorithms are designed to operate on classification data with an equal number of observations for each class. When this is not the case, algorithms can learn that very few examples are not important and can be ignored in order to achieve good performance. Data sampling provides a collection of techniques that transform a training dataset […]

September 29, 2020 Machine Learning

Cost-Sensitive Logistic Regression for Imbalanced Classification

Last Updated on August 28, 2020 Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The weighting can penalize the model less for errors made on examples from the majority class and penalize the model […]

September 29, 2020 Machine Learning

Cost-Sensitive Decision Trees for Imbalanced Classification

Last Updated on August 21, 2020 The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. This problem can be […]

September 29, 2020 Machine Learning

Cost-Sensitive SVM for Imbalanced Classification

Last Updated on August 21, 2020 The Support Vector Machine algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The SVM algorithm finds a hyperplane decision boundary that best splits the examples into two classes. The split is made soft through the use of a margin that allows some points to be misclassified. By default, this margin favors the majority class on imbalanced datasets, although it can be updated to take the importance of […]

September 29, 2020 Machine Learning

How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification

Last Updated on August 21, 2020 Deep learning neural networks are a flexible class of machine learning algorithms that perform well on a wide range of problems. Neural networks are trained using the backpropagation of error algorithm that involves calculating errors made by the model on the training dataset and updating the model weights in proportion to those errors. The limitation of this method of training is that examples from each class are treated the same, which for imbalanced datasets […]

September 29, 2020 Machine Learning

How to Configure XGBoost for Imbalanced Classification

Last Updated on August 21, 2020 The XGBoost algorithm is effective for a wide range of regression and classification predictive modeling problems. It is an efficient implementation of the stochastic gradient boosting algorithm and offers a range of hyperparameters that give fine-grained control over the model training procedure. Although the algorithm performs well in general, even on imbalanced classification datasets, it offers a way to tune the training algorithm to pay more attention to misclassification of the minority class for […]

« 1 … 205 206 207 208 209 … 226 »