A Gentle Introduction to Markov Chain Monte Carlo for Probability

Probabilistic inference involves estimating an expected value or density using a probabilistic model. Often, directly inferring values is not tractable with probabilistic models, and instead, approximation methods must be used. Markov Chain Monte Carlo sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike Monte Carlo sampling methods that are able to draw independent samples from the distribution, Markov Chain Monte Carlo methods draw samples where the next sample is dependent on the existing sample, […]

Read more

A Gentle Introduction to Maximum a Posteriori (MAP) for Machine Learning

Density estimation is the problem of estimating the probability distribution for a sample of observations from a problem domain. Typically, estimating the entire distribution is intractable, and instead, we are happy to have the expected value of the distribution, such as the mean or mode. Maximum a Posteriori or MAP for short is a Bayesian-based approach to estimating a distribution and model parameters that best explain an observed dataset. This flexible probabilistic framework can be used to provide a Bayesian […]

Read more

14 Different Types of Learning in Machine Learning

Last Updated on November 11, 2019 Machine learning is a large field of study that overlaps with and inherits ideas from many related fields such as artificial intelligence. The focus of the field is learning, that is, acquiring skills or knowledge from experience. Most commonly, this means synthesizing useful concepts from historical data. As such, there are many different types of learning that you may encounter as a practitioner in the field of machine learning: from whole fields of study […]

Read more

How to Save a NumPy Array to File for Machine Learning

Last Updated on August 19, 2020 Developing machine learning models in Python often requires the use of NumPy arrays. NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays. As such, it is common to need to save NumPy arrays […]

Read more

How to Connect Model Input Data With Predictions for Machine Learning

Last Updated on August 19, 2020 Fitting a model to a training dataset is so easy today with libraries like scikit-learn. A model can be fit and evaluated on a dataset in just a few lines of code. It is so easy that it has become a problem. The same few lines of code are repeated again and again and it may not be obvious how to actually use the model to make a prediction. Or, if a prediction is […]

Read more

What Does Stochastic Mean in Machine Learning?

Last Updated on July 24, 2020 The behavior and performance of many machine learning algorithms are referred to as stochastic. Stochastic refers to a variable process where the outcome involves some randomness and has some uncertainty. It is a mathematical term and is closely related to “randomness” and “probabilistic” and can be contrasted to the idea of “deterministic.” The stochastic nature of machine learning algorithms is an important foundational concept in machine learning and is required to be understand in […]

Read more

How to Save and Reuse Data Preparation Objects in Scikit-Learn

Last Updated on June 30, 2020 It is critical that any data preparation performed on a training dataset is also performed on a new dataset in the future. This may include a test dataset when evaluating a model or new data from the domain when using a model to make predictions. Typically, the model fit on the training dataset is saved for later use. The correct solution to preparing new data for the model in the future is to also […]

Read more

3 Ways to Encode Categorical Variables for Deep Learning

Last Updated on August 27, 2020 Machine learning and deep learning models, like those in Keras, require all input and output variables to be numeric. This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model. The two most popular techniques are an integer encoding and a one hot encoding, although a newer technique called learned embedding may provide a useful middle ground between these two methods. In […]

Read more

How to Perform Feature Selection with Categorical Data

Last Updated on August 18, 2020 Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonly used feature selection methods for categorical input data when the target variable is also categorical (e.g. classification predictive modeling) are the chi-squared […]

Read more

How to Choose a Feature Selection Method For Machine Learning

Last Updated on August 20, 2020 Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship […]

Read more
1 839 840 841 842 843 905