How to Use an Empirical Distribution Function in Python

Last Updated on August 28, 2020 An empirical distribution function provides a way to model and sample cumulative probabilities for a data sample that does not fit a standard probability distribution. As such, it is sometimes called the empirical cumulative distribution function, or ECDF for short. In this tutorial, you will discover the empirical probability distribution function. After completing this tutorial, you will know: Some data samples cannot be summarized using a standard distribution. An empirical distribution function provides a […]

Read more

A Gentle Introduction to Model Selection for Machine Learning

Given easy-to-use machine learning libraries like scikit-learn and Keras, it is straightforward to fit many different machine learning models on a given predictive modeling dataset. The challenge of applied machine learning, therefore, becomes how to choose among a range of different models that you can use for your problem. Naively, you might believe that model performance is sufficient, but should you consider other concerns, such as how long the model takes to train or how easy it is to explain […]

Read more

A Gentle Introduction to the Bayes Optimal Classifier

Last Updated on August 19, 2020 The Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for a new example. It is described using the Bayes Theorem that provides a principled way for calculating a conditional probability. It is also closely related to the Maximum a Posteriori: a probabilistic framework referred to as MAP that finds the most probable hypothesis for a training dataset. In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable […]

Read more

How to Use Out-of-Fold Predictions in Machine Learning

Last Updated on August 28, 2020 Machine learning algorithms are typically evaluated using resampling techniques such as k-fold cross-validation. During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are referred to as out-of-fold predictions, a type of out-of-sample predictions. Out-of-fold predictions play an important role in machine learning in both estimating the performance of a model when making predictions on new data in the future, so-called the […]

Read more

Develop an Intuition for Bayes Theorem With Worked Examples

Last Updated on August 19, 2020 Bayes Theorem provides a principled way for calculating a conditional probability. It is a deceptively simple calculation, providing a method that is easy to use for scenarios where our intuition often fails. The best way to develop an intuition for Bayes Theorem is to think about the meaning of the terms in the equation and to apply the calculation many times in a range of different real-world scenarios. This will provide the context for […]

Read more

How to Develop Super Learner Ensembles in Python

Last Updated on August 17, 2020 Selecting a machine learning algorithm for a predictive modeling problem involves evaluating many different models and model configurations using k-fold cross-validation. The super learner is an ensemble machine learning algorithm that combines all of the models and model configurations that you might investigate for a predictive modeling problem and uses them to make a prediction as-good-as or better than any single model that you may have investigated. The super learner algorithm is an application […]

Read more

Tune Hyperparameters for Classification Machine Learning Algorithms

Last Updated on August 28, 2020 Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset. Hyperparameters are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model. Typically, it is challenging to know what values to use for the hyperparameters of a given algorithm on a given dataset, therefore it […]

Read more

How to Transform Target Variables for Regression in Python

Last Updated on August 18, 2020 Data preparation is a big part of applied machine learning. Correctly preparing your training data can mean the difference between mediocre and extraordinary results, even with very simple linear algorithms. Performing data preparation operations, such as scaling, is relatively straightforward for input variables and has been made routine in Python via the Pipeline scikit-learn class. On regression predictive modeling problems where a numerical value must be predicted, it can also be critical to scale […]

Read more

Arithmetic, Geometric, and Harmonic Means for Machine Learning

Last Updated on August 19, 2020 Calculating the average of a variable or a list of numbers is a common operation in machine learning. It is an operation you may use every day either directly, such as when summarizing data, or indirectly, such as a smaller step in a larger procedure when fitting a model. The average is a synonym for the mean, a number that represents the most likely value from a probability distribution. As such, there are multiple […]

Read more

Best Results for Standard Machine Learning Datasets

Last Updated on August 28, 2020 It is important that beginner machine learning practitioners practice on small real-world datasets. So-called standard machine learning datasets contain actual observations, fit into memory, and are well studied and well understood. As such, they can be used by beginner practitioners to quickly test, explore, and practice data preparation and modeling techniques. A practitioner can confirm whether they have the data skills required to achieve a good result on a standard machine learning dataset. A […]

Read more
1 840 841 842 843 844 905