How to Train to the Test Set in Machine Learning
Training to the test set is a type of overfitting where a model is prepared that intentionally achieves good performance on a given test set at the expense of increased generalization error.
It is a type of overfitting that is common in machine learning competitions where a complete training dataset is provided and where only the input portion of a test set is provided. One approach to training to the test set involves constructing a training set that most resembles the test set and then using it as the basis for training a model. The model is expected to have better performance on the test set, but most likely worse performance on the training dataset and on any new data in the future.
Although overfitting the test set is not desirable, it can be interesting to explore as a thought experiment and provide more insight into both machine learning competitions and avoiding overfitting generally.
In this tutorial, you will discover how to intentionally train to the test set for classification and regression problems.
After completing this tutorial, you will know:
- Training to the test set is a type of data leakage that may occur in machine learning competitions.
- One