Stochastic Gradient Boosting with XGBoost and scikit-learn in Python
Last Updated on August 27, 2020
A simple technique for ensembling decision trees involves training trees on subsamples of the training dataset.
Subsets of the the rows in the training data can be taken to train individual trees called bagging. When subsets of rows of the training data are also taken when calculating each split point, this is called random forest.
These techniques can also be used in the gradient tree boosting model in a technique called stochastic gradient boosting.
In this post you will discover stochastic gradient boosting and how to tune the sampling parameters using XGBoost with scikit-learn in Python.
After reading this post you will know:
- The rationale behind training trees on subsamples of data and how this can be used in gradient boosting.
- How to tune row-based subsampling in XGBoost using scikit-learn.
- How to tune column-based subsampling by both tree and split-point in XGBoost.
Kick-start your project with my new book XGBoost With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
- Update Jan/2017: Updated to reflect changes in scikit-learn API version 0.18.1.