How to Configure k-Fold Cross-Validation

Last Updated on August 26, 2020 The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm on a dataset. A common value for k is 10, although how do we know that this configuration is appropriate for our dataset and our algorithms? One approach is to explore the effect of different k values on the estimate of model performance and compare this to an ideal test condition. This can help to choose an […]

Read more

Repeated k-Fold Cross-Validation for Model Evaluation in Python

Last Updated on August 26, 2020 The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple […]

Read more

How to Use XGBoost for Time Series Forecasting

Last Updated on August 27, 2020 XGBoost is an efficient implementation of gradient boosting for classification and regression problems. It is both fast and efficient, performing well, if not the best, on a wide range of predictive modeling tasks and is a favorite among data science competition winners, such as those on Kaggle. XGBoost can also be used for time series forecasting, although it requires that the time series dataset be transformed into a supervised learning problem first. It also […]

Read more

Multi-Class Imbalanced Classification

Last Updated on August 21, 2020 Imbalanced classification are those prediction tasks where the distribution of examples across class labels is not equal. Most imbalanced classification examples focus on binary classification tasks, yet many of the tools and techniques for imbalanced classification also directly support multi-class classification problems. In this tutorial, you will discover how to use the tools of imbalanced classification with a multi-class dataset. After completing this tutorial, you will know: About the glass identification standard imbalanced multi-class […]

Read more

How to use Seaborn Data Visualization for Machine Learning

Last Updated on August 19, 2020 Data visualization provides insight into the distribution and relationships between variables in a dataset. This insight can be helpful in selecting data preparation techniques to apply prior to modeling and the types of algorithms that may be most suited to the data. Seaborn is a data visualization library for Python that runs on top of the popular Matplotlib data visualization library, although it provides a simple interface and aesthetically better-looking plots. In this tutorial, […]

Read more

A Gentle Introduction to Computational Learning Theory

Last Updated on September 7, 2020 Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having a high-level understanding of some of the more prominent methods may provide insight into the broader task of learning […]

Read more

Plot a Decision Surface for Machine Learning Algorithms in Python

Last Updated on August 26, 2020 Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision surface plot is a powerful tool for understanding how a given model “sees” the prediction task and how it […]

Read more

Why Do I Get Different Results Each Time in Machine Learning?

Last Updated on August 27, 2020 Are you getting different results for your machine learning algorithm? Perhaps your results differ from a tutorial and you want to understand why. Perhaps your model is making different predictions each time it is trained, even when it is trained on the same data set each time. This is to be expected and might even be a feature of the algorithm, not a bug. In this tutorial, you will discover why you can expect […]

Read more

How to Calculate the Bias-Variance Trade-off with Python

Last Updated on August 26, 2020 The performance of a machine learning model can be characterized in terms of the bias and the variance of the model. A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is highly dependent upon the specifics of the training dataset, such as unpruned decision trees. We desire models with low […]

Read more

Hypothesis Test for Comparing Machine Learning Algorithms

Last Updated on September 1, 2020 Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation. The algorithm with the best mean performance is expected to be better than those algorithms with worse mean performance. But what if the difference in the mean performance is caused by a statistical fluke? The solution is to use a statistical hypothesis test to evaluate whether the difference in the mean performance between any two algorithms is real or […]

Read more
1 847 848 849 850 851 901