How to Handle Big-p, Little-n (p >> n) in Machine Learning
Last Updated on August 19, 2020
What if I have more Columns than Rows in my dataset?
Machine learning datasets are often structured or tabular data comprised of rows and columns.
The columns that are fed as input to a model are called predictors or “p” and the rows are samples “n“. Most machine learning algorithms assume that there are many more samples than there are predictors, denoted as p << n.
Sometimes, this is not the case, and there are many more predictors than samples in the dataset, referred to as “big-p, little-n” and denoted as p >> n. These problems often require specialized data preparation and modeling algorithms to address them correctly.
In this tutorial, you will discover the challenge of big-p, little n or p >> n machine learning problems.
After completing this tutorial, you will know:
- Most machine learning problems have many more samples than predictors and most machine learning algorithms make this assumption during the training process.
- Some modeling problems have many more predictors than samples, referred to as p >> n.
- Algorithms to explore when modeling machine learning datasets with more predictors than samples.
Kick-start your project with my new book To finish reading, please visit source site