7 Ways to Handle Large Data Files for Machine Learning
Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common.
This leads to questions like:
- How do I load my multiple gigabyte data file?
- Algorithms crash when I try to run my dataset; what should I do?
- Can you help me with out-of-memory errors?
In this post, I want to offer some common suggestions you may want to consider.
1. Allocate More Memory
Some machine learning tools or libraries may be limited by a default memory configuration.
Check if you can re-configure your tool or library to allocate more memory.
A good example is Weka, where you can increase the memory as a parameter when starting the application.
2. Work with a Smaller Sample
Are you sure you need to work with all of the data?
Take a random sample of your data, such as the first 1,000 or 100,000 rows.
To finish reading, please visit source site