A Gentle Introduction to Nonparametric Statistics

Last Updated on November 10, 2019 A large portion of the field of statistics and statistical methods is dedicated to data where the distribution is known. Samples of data where we already know or can easily identify the distribution of are called parametric data. Often, parametric is used to refer to data that was drawn from a Gaussian distribution in common usage. Data in which the distribution is unknown or cannot be easily identified is called nonparametric. In the case […]

Read more

A Gentle Introduction to Normality Tests in Python

Last Updated on August 8, 2019 An important decision point when working with a sample of data is whether to use parametric or nonparametric statistical methods. Parametric statistical methods assume that the data has a known and specific distribution, often a Gaussian distribution. If a data sample is not Gaussian, then the assumptions of parametric statistical tests are violated and nonparametric statistical methods must be used. There are a range of techniques that you can use to check if your […]

Read more

A Gentle Introduction to Statistical Hypothesis Testing

Last Updated on April 10, 2020 Data must be interpreted in order to add meaning. We can interpret data by assuming a specific structure our outcome and use statistical methods to confirm or reject the assumption. The assumption is called a hypothesis and the statistical tests used for this purpose are called statistical hypothesis tests. Whenever we want to make claims about the distribution of data or whether one set of results are different from another set of results in […]

Read more

How to Calculate Nonparametric Statistical Hypothesis Tests in Python

Last Updated on August 8, 2019 In applied machine learning, we often need to determine whether two data samples have the same or different distributions. We can answer this question using statistical significance tests that can quantify the likelihood that the samples have the same distribution. If the data does not have the familiar Gaussian distribution, we must resort to nonparametric version of the significance tests. These tests operate in a similar manner, but are distribution free, requiring that real […]

Read more

How to Calculate Parametric Statistical Hypothesis Tests in Python

Last Updated on August 8, 2019 Parametric statistical methods often mean those methods that assume the data samples have a Gaussian distribution. in applied machine learning, we need to compare data samples, specifically the mean of the samples. Perhaps to see if one technique performs better than another on one or more datasets. To quantify this question and interpret the results, we can use parametric hypothesis testing methods such as the Student’s t-test and ANOVA. In this tutorial, you will […]

Read more

How to Transform Data to Better Fit The Normal Distribution

Last Updated on August 8, 2019 A large portion of the field of statistics is concerned with methods that assume a Gaussian distribution: the familiar bell curve. If your data has a Gaussian distribution, the parametric methods are powerful and well understood. This gives some incentive to use them if possible. Even if your data does not have a Gaussian distribution. It is possible that your data does not look Gaussian or fails a normality test, but can be transformed […]

Read more

A Gentle Introduction to k-fold Cross-Validation

Last Updated on August 3, 2020 Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods. In this tutorial, you will discover a gentle introduction to the k-fold cross-validation procedure for estimating the […]

Read more

A Gentle Introduction to the Bootstrap Method

Last Updated on August 8, 2019 The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data not included in the training data. A desirable property of the results from estimating machine learning model skill is […]

Read more

Confidence Intervals for Machine Learning

Last Updated on August 8, 2019 Much of machine learning involves estimating the performance of a machine learning algorithm on unseen data. Confidence intervals are a way of quantifying the uncertainty of an estimate. They can be used to add a bounds or likelihood on a population parameter, such as a mean, estimated from a sample of independent observations from the population. Confidence intervals come from the field of estimation statistics. In this tutorial, you will discover confidence intervals and […]

Read more

Prediction Intervals for Machine Learning

Last Updated on May 1, 2020 A prediction from a machine learning perspective is a single point that hides the uncertainty of that prediction. Prediction intervals provide a way to quantify and communicate the uncertainty in a prediction. They are different from confidence intervals that instead seek to quantify the uncertainty in a population parameter such as a mean or standard deviation. Prediction intervals describe the uncertainty for a single specific outcome. In this tutorial, you will discover the prediction […]

Read more
1 814 815 816 817 818 905