A Gentle Introduction to the Bootstrap Method
Last Updated on August 8, 2019
The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement.
It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data not included in the training data.
A desirable property of the results from estimating machine learning model skill is that the estimated skill can be presented with confidence intervals, a feature not readily available with other methods such as cross-validation.
In this tutorial, you will discover the bootstrap resampling method for estimating the skill of machine learning models on unseen data.
After completing this tutorial, you will know:
- The bootstrap method involves iteratively resampling a dataset with replacement.
- That when using the bootstrap you must choose the size of the sample and the number of repeats.
- The scikit-learn provides a function that you can use to resample a dataset for the bootstrap method.
Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.