How to Calculate Feature Importance With Python

Last Updated on August 20, 2020 Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. There are many types and sources of feature importance scores, although popular examples include statistical correlation scores, coefficients calculated as part of linear models, decision trees, and permutation importance scores. Feature importance scores play an important role in a predictive modeling project, including providing insight into the data, insight into the […]

Read more

Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost

Last Updated on August 28, 2020 Gradient boosting is a powerful ensemble machine learning algorithm. It’s popular for structured predictive modeling problems, such as classification and regression on tabular data, and is often the main algorithm or one of the main algorithms used in winning solutions to machine learning competitions, like those on Kaggle. There are many implementations of gradient boosting available, including standard implementations in SciPy and efficient third-party libraries. Each uses a different interface and even different names […]

Read more

What Is Argmax in Machine Learning?

Last Updated on August 19, 2020 Argmax is a mathematical function that you may encounter in applied machine learning. For example, you may see “argmax” or “arg max” used in a research paper used to describe an algorithm. You may also be instructed to use the argmax function in your algorithm implementation. This may be the first time that you encounter the argmax function and you may wonder what it is and how it works. In this tutorial, you will […]

Read more

10 Clustering Algorithms With Python

Last Updated on August 20, 2020 Clustering or cluster analysis is an unsupervised learning problem. It is often used as a data analysis technique for discovering interesting patterns in data, such as groups of customers based on their behavior. There are many clustering algorithms to choose from and no single best clustering algorithm for all cases. Instead, it is a good idea to explore a range of clustering algorithms and different configurations for each algorithm. In this tutorial, you will […]

Read more

4 Types of Classification Tasks in Machine Learning

Last Updated on August 19, 2020 Machine learning is a field of study and is concerned with algorithms that learn from examples. Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain. An easy to understand example is classifying emails as “spam” or “not spam.” There are many different types of classification tasks that you may encounter in machine learning and specialized approaches to […]

Read more

Stacking Ensemble Machine Learning With Python

Last Updated on August 17, 2020 Stacking or Stacked Generalization is an ensemble machine learning algorithm. It uses a meta-learning algorithm to learn how to best combine the predictions from two or more base machine learning algorithms. The benefit of stacking is that it can harness the capabilities of a range of well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble. In this tutorial, you will discover […]

Read more

One-vs-Rest and One-vs-One for Multi-Class Classification

Last Updated on September 7, 2020 Not all classification predictive models support multi-class classification. Algorithms such as the Perceptron, Logistic Regression, and Support Vector Machines were designed for binary classification and do not natively support classification tasks with more than two classes. One approach for using binary classification algorithms for multi-classification problems is to split the multi-class classification dataset into multiple binary classification datasets and fit a binary classification model on each. Two different examples of this approach are the […]

Read more

How to Handle Big-p, Little-n (p >> n) in Machine Learning

Last Updated on August 19, 2020 What if I have more Columns than Rows in my dataset? Machine learning datasets are often structured or tabular data comprised of rows and columns. The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many more samples than there are predictors, denoted as p > n. These problems often require specialized data preparation and […]

Read more

How to Develop Voting Ensembles With Python

Last Updated on September 7, 2020 Voting is an ensemble machine learning algorithm. For regression, a voting ensemble involves making a prediction that is the average of multiple other regression models. In classification, a hard voting ensemble involves summing the votes for crisp class labels from other models and predicting the class with the most votes. A soft voting ensemble involves summing the predicted probabilities for class labels and predicting the class label with the largest sum probability. In this […]

Read more

How to Develop a Random Forest Ensemble in Python

Last Updated on September 7, 2020 Random forest is an ensemble machine learning algorithm. It is perhaps the most popular and widely used machine learning algorithm given its good or excellent performance across a wide range of classification and regression predictive modeling problems. It is also easy to use given that it has few key hyperparameters and sensible heuristics for configuring these hyperparameters. In this tutorial, you will discover how to develop a random forest ensemble for classification and regression. […]

Read more
1 851 852 853 854 855 910