How to Get the Most From Your Machine Learning Data

Last Updated on June 30, 2020 The data that you use, and how you use it, will likely define the success of your predictive modeling problem. Data and the framing of your problem may be the point of biggest leverage on your project. Choosing the wrong data or the wrong framing for your problem may lead to a model with poor performance or, at worst, a model that cannot converge. It is not possible to analytically calculate what data to […]

Read more

The Model Performance Mismatch Problem (and what to do about it)

What To Do If Model Test Results Are Worse than Training. The procedure when evaluating machine learning models is to fit and evaluate them on training data, then verify that the model has good skill on a held-back test dataset. Often, you will get a very promising performance when evaluating the model on the training dataset and poor performance when evaluating the model on the test set. In this post, you will discover techniques and issues to consider when you […]

Read more

How To Know if Your Machine Learning Model Has Good Performance

After you develop a machine learning model for your predictive modeling problem, how do you know if the performance of the model is any good? This is a common question I am asked by beginners. As a beginner, you often seek an answer to this question, e.g. you want someone to tell you whether an accuracy of x% or an error score of x is good or not. In this post, you will discover how to answer this question for […]

Read more

Introduction to Random Number Generators for Machine Learning in Python

Last Updated on July 31, 2020 Randomness is a big part of machine learning. Randomness is used as a tool or a feature in preparing data and in learning algorithms that map input data to output data in order to make predictions. In order to understand the need for statistical methods in machine learning, you must understand the source of randomness in machine learning. The source of randomness in machine learning is a mathematical trick called a pseudorandom number generator. […]

Read more

How to Remove Outliers for Machine Learning

Last Updated on August 18, 2020 When modeling, it is important to clean the data sample to ensure that the observations best represent the problem. Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers and often machine learning modeling and model skill in general can be improved by understanding and even removing these outlier values. In this tutorial, you will discover outliers and how […]

Read more

How to Calculate Correlation Between Variables in Python

Last Updated on August 20, 2020 There may be complex and unknown relationships between the variables in your dataset. It is important to discover and quantify the degree to which variables in your dataset are dependent upon each other. This knowledge can help you better prepare your data to meet the expectations of machine learning algorithms, such as linear regression, whose performance will degrade with the presence of these interdependencies. In this tutorial, you will discover that correlation is the […]

Read more

A Gentle Introduction to Calculating Normal Summary Statistics

Last Updated on August 8, 2019 A sample of data is a snapshot from a broader population of all possible observations that could be taken of a domain or generated by a process. Interestingly, many observations fit a common pattern or distribution called the normal distribution, or more formally, the Gaussian distribution. A lot is known about the Gaussian distribution, and as such, there are whole sub-fields of statistics and statistical methods that can be used with Gaussian data. In […]

Read more

A Gentle Introduction to the Law of Large Numbers in Machine Learning

Last Updated on August 8, 2019 We have an intuition that more observations is better. This is the same intuition behind the idea that if we collect more data, our sample of data will be more representative of the problem domain. There is a theorem in statistics and probability that supports this intuition that is a pillar of both of these fields and has important implications in applied machine learning. The name of this theorem is the law of large […]

Read more

A Gentle Introduction to the Central Limit Theorem for Machine Learning

Last Updated on January 14, 2020 The central limit theorem is an often quoted, but misunderstood pillar from statistics and machine learning. It is often confused with the law of large numbers. Although the theorem may seem esoteric to beginners, it has important implications about how and why we can make inferences about the skill of machine learning models, such as whether one model is statistically better than another and confidence intervals on models skill. In this tutorial, you will […]

Read more

Statistics Books for Machine Learning

Last Updated on August 14, 2020 Statistical methods are used at each step in an applied machine learning project. This means it is important to have a strong grasp of the fundamentals of the key findings from statistics and a working knowledge of relevant statistical methods. Unfortunately, statistics is not covered in many computer science and software engineering degree programs. Even if it is, it may be taught in a bottom-up, theory-first manner, making it unclear which parts are relevant […]

Read more
1 813 814 815 816 817 905