A Gentle Introduction to Logistic Regression With Maximum Likelihood Estimation

Last Updated on October 28, 2019 Logistic regression is a model for binary classification predictive modeling. The parameters of a logistic regression model can be estimated by the probabilistic framework called maximum likelihood estimation. Under this framework, a probability distribution for the target variable (class label) must be assumed and then a likelihood function defined that calculates the probability of observing the outcome given the input data and the model. This function can then be optimized to find the set […]

Read more

Probabilistic Model Selection with AIC, BIC, and MDL

Last Updated on August 28, 2020 Model selection is the problem of choosing one from among a set of candidate models. It is common to choose a model that performs the best on a hold-out test dataset or to estimate model performance using a resampling technique, such as k-fold cross-validation. An alternative approach to model selection involves using probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model. Examples […]

Read more

A Gentle Introduction to Expectation-Maximization (EM Algorithm)

Last Updated on August 28, 2020 Maximum likelihood estimation is an approach to density estimation for a dataset by searching across probability distributions and their parameters. It is a general and effective approach that underlies many machine learning algorithms, although it requires that the training dataset is complete, e.g. all relevant interacting random variables are present. Maximum likelihood becomes intractable if there are variables that interact with those in the dataset but were hidden or not observed, so-called latent variables. […]

Read more

A Gentle Introduction to Monte Carlo Sampling for Probability

Monte Carlo methods are a class of techniques for randomly sampling a probability distribution. There are many problem domains where describing or estimating the probability distribution is relatively straightforward, but calculating a desired quantity is intractable. This may be due to many reasons, such as the stochastic nature of the domain or an exponential number of random variables. Instead, a desired quantity can be approximated by using random sampling, referred to as Monte Carlo methods. These methods were initially used […]

Read more

A Gentle Introduction to Markov Chain Monte Carlo for Probability

Probabilistic inference involves estimating an expected value or density using a probabilistic model. Often, directly inferring values is not tractable with probabilistic models, and instead, approximation methods must be used. Markov Chain Monte Carlo sampling provides a class of algorithms for systematic random sampling from high-dimensional probability distributions. Unlike Monte Carlo sampling methods that are able to draw independent samples from the distribution, Markov Chain Monte Carlo methods draw samples where the next sample is dependent on the existing sample, […]

Read more

A Gentle Introduction to Maximum a Posteriori (MAP) for Machine Learning

Density estimation is the problem of estimating the probability distribution for a sample of observations from a problem domain. Typically, estimating the entire distribution is intractable, and instead, we are happy to have the expected value of the distribution, such as the mean or mode. Maximum a Posteriori or MAP for short is a Bayesian-based approach to estimating a distribution and model parameters that best explain an observed dataset. This flexible probabilistic framework can be used to provide a Bayesian […]

Read more

14 Different Types of Learning in Machine Learning

Last Updated on November 11, 2019 Machine learning is a large field of study that overlaps with and inherits ideas from many related fields such as artificial intelligence. The focus of the field is learning, that is, acquiring skills or knowledge from experience. Most commonly, this means synthesizing useful concepts from historical data. As such, there are many different types of learning that you may encounter as a practitioner in the field of machine learning: from whole fields of study […]

Read more

How to Save a NumPy Array to File for Machine Learning

Last Updated on August 19, 2020 Developing machine learning models in Python often requires the use of NumPy arrays. NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays. As such, it is common to need to save NumPy arrays […]

Read more

How to Connect Model Input Data With Predictions for Machine Learning

Last Updated on August 19, 2020 Fitting a model to a training dataset is so easy today with libraries like scikit-learn. A model can be fit and evaluated on a dataset in just a few lines of code. It is so easy that it has become a problem. The same few lines of code are repeated again and again and it may not be obvious how to actually use the model to make a prediction. Or, if a prediction is […]

Read more

What Does Stochastic Mean in Machine Learning?

Last Updated on July 24, 2020 The behavior and performance of many machine learning algorithms are referred to as stochastic. Stochastic refers to a variable process where the outcome involves some randomness and has some uncertainty. It is a mathematical term and is closely related to “randomness” and “probabilistic” and can be contrasted to the idea of “deterministic.” The stochastic nature of machine learning algorithms is an important foundational concept in machine learning and is required to be understand in […]

Read more
1 844 845 846 847 848 910