Articles About Machine Learning

How to Connect Model Input Data With Predictions for Machine Learning

Last Updated on August 19, 2020 Fitting a model to a training dataset is so easy today with libraries like scikit-learn. A model can be fit and evaluated on a dataset in just a few lines of code. It is so easy that it has become a problem. The same few lines of code are repeated again and again and it may not be obvious how to actually use the model to make a prediction. Or, if a prediction is […]

Read more

What Does Stochastic Mean in Machine Learning?

Last Updated on July 24, 2020 The behavior and performance of many machine learning algorithms are referred to as stochastic. Stochastic refers to a variable process where the outcome involves some randomness and has some uncertainty. It is a mathematical term and is closely related to “randomness” and “probabilistic” and can be contrasted to the idea of “deterministic.” The stochastic nature of machine learning algorithms is an important foundational concept in machine learning and is required to be understand in […]

Read more

How to Save and Reuse Data Preparation Objects in Scikit-Learn

Last Updated on June 30, 2020 It is critical that any data preparation performed on a training dataset is also performed on a new dataset in the future. This may include a test dataset when evaluating a model or new data from the domain when using a model to make predictions. Typically, the model fit on the training dataset is saved for later use. The correct solution to preparing new data for the model in the future is to also […]

Read more

3 Ways to Encode Categorical Variables for Deep Learning

Last Updated on August 27, 2020 Machine learning and deep learning models, like those in Keras, require all input and output variables to be numeric. This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model. The two most popular techniques are an integer encoding and a one hot encoding, although a newer technique called learned embedding may provide a useful middle ground between these two methods. In […]

Read more

How to Perform Feature Selection with Categorical Data

Last Updated on August 18, 2020 Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonly used feature selection methods for categorical input data when the target variable is also categorical (e.g. classification predictive modeling) are the chi-squared […]

Read more

How to Choose a Feature Selection Method For Machine Learning

Last Updated on August 20, 2020 Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship […]

Read more

How to Use an Empirical Distribution Function in Python

Last Updated on August 28, 2020 An empirical distribution function provides a way to model and sample cumulative probabilities for a data sample that does not fit a standard probability distribution. As such, it is sometimes called the empirical cumulative distribution function, or ECDF for short. In this tutorial, you will discover the empirical probability distribution function. After completing this tutorial, you will know: Some data samples cannot be summarized using a standard distribution. An empirical distribution function provides a […]

Read more

A Gentle Introduction to Model Selection for Machine Learning

Given easy-to-use machine learning libraries like scikit-learn and Keras, it is straightforward to fit many different machine learning models on a given predictive modeling dataset. The challenge of applied machine learning, therefore, becomes how to choose among a range of different models that you can use for your problem. Naively, you might believe that model performance is sufficient, but should you consider other concerns, such as how long the model takes to train or how easy it is to explain […]

Read more

A Gentle Introduction to the Bayes Optimal Classifier

Last Updated on August 19, 2020 The Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for a new example. It is described using the Bayes Theorem that provides a principled way for calculating a conditional probability. It is also closely related to the Maximum a Posteriori: a probabilistic framework referred to as MAP that finds the most probable hypothesis for a training dataset. In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable […]

Read more

How to Use Out-of-Fold Predictions in Machine Learning

Last Updated on August 28, 2020 Machine learning algorithms are typically evaluated using resampling techniques such as k-fold cross-validation. During the k-fold cross-validation process, predictions are made on test sets comprised of data not used to train the model. These predictions are referred to as out-of-fold predictions, a type of out-of-sample predictions. Out-of-fold predictions play an important role in machine learning in both estimating the performance of a model when making predictions on new data in the future, so-called the […]

Read more
1 202 203 204 205 206 226