Statistics in Plain English for Machine Learning

Last Updated on August 8, 2019 There is an ocean of books on statistics; where do you start? A big problem in choosing a beginner book on statistics is that a book may suffer one of two common problems. It may be a mathematical textbook filled with derivations, special cases, and proofs for each statistical method with little idea for the intuition for the method or how to use it. Or it may be a playbook for a proprietary or […]

Read more

How to Calculate Nonparametric Rank Correlation in Python

Last Updated on August 8, 2019 Correlation is a measure of the association between two variables. It is easy to calculate and interpret when both variables have a well understood Gaussian distribution. When we do not know the distribution of the variables, we must use nonparametric rank correlation methods. In this tutorial, you will discover rank correlation methods for quantifying the association between variables with a non-Gaussian distribution. After completing this tutorial, you will know: How rank correlation methods work […]

Read more

A Gentle Introduction to Effect Size Measures in Python

Last Updated on August 8, 2019 Statistical hypothesis tests report on the likelihood of the observed results given an assumption, such as no association between variables or no difference between groups. Hypothesis tests do not comment on the size of the effect if the association or difference is statistically significant. This highlights the need for standard ways of calculating and reporting a result. Effect size methods refer to a suite of statistical tools from the the field of estimation statistics […]

Read more

A Gentle Introduction to Statistical Power and Power Analysis in Python

Last Updated on April 24, 2020 The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate the number of observations or sample size required in order to detect an […]

Read more

All of Statistics for Machine Learning

Last Updated on August 8, 2019 A foundation in statistics is required to be effective as a machine learning practitioner. The book “All of Statistics” was written specifically to provide a foundation in probability and statistics for computer science undergraduates that may have an interest in data mining and machine learning. As such, it is often recommended as a book to machine learning practitioners interested in expanding their understanding of statistics. In this post, you will discover the book “All […]

Read more

The Role of Randomization to Address Confounding Variables in Machine Learning

Last Updated on July 31, 2020 A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is the use of randomness in a learning algorithm, such as random initialization or random choices during learning. The solution […]

Read more

Difference Between a Batch and an Epoch in a Neural Network

Last Updated on October 26, 2019 Stochastic gradient descent is a learning algorithm that has a number of hyperparameters. Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do the same thing. In this post, you will discover the difference between batches and epochs in stochastic gradient descent. After reading this post, you will know: Stochastic gradient descent is an iterative learning algorithm that uses a training […]

Read more

When to Use MLP, CNN, and RNN Neural Networks

Last Updated on August 19, 2019 What neural network is appropriate for your predictive modeling problem? It can be difficult for a beginner to the field of deep learning to know what type of network to use. There are so many types of networks to choose from and new methods being published and discussed every day. To make things worse, most neural networks are flexible enough that they work (make a prediction) even when used with the wrong type of […]

Read more

How to Calculate McNemar’s Test to Compare Two Machine Learning Classifiers

Last Updated on August 8, 2019 The choice of a statistical hypothesis test is a challenging open problem for interpreting machine learning results. In his widely cited 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to train multiple copies of classifier models. This describes the current situation with deep learning models that are both very large and are trained and evaluated on large datasets, often requiring days or weeks to train […]

Read more

How to Configure the Number of Layers and Nodes in a Neural Network

Last Updated on August 6, 2019 Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network. The most reliable way to configure these hyperparameters for your specific predictive modeling problem is via systematic experimentation with a robust test harness. This can be a tough pill to swallow for beginners to […]

Read more
1 822 823 824 825 826 910