A Gentle Introduction to Statistical Sampling and Resampling
Last Updated on August 8, 2019
Data is the currency of applied machine learning. Therefore, it is important that it is both collected and used effectively.
Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify the uncertainty of the estimate.
Both data sampling and data resampling are methods that are required in a predictive modeling problem.
In this tutorial, you will discover statistical sampling and statistical resampling methods for gathering and making best use of data.
After completing this tutorial, you will know:
- Sampling is an active process of gathering observations with the intent of estimating a population variable.
- Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
- Resampling methods, in fact, make use of a nested resampling method.
Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.