7 Ways to Handle Large Data Files for Machine Learning

Exploring and applying machine learning algorithms to datasets that are too large to fit into memory is pretty common.

This leads to questions like:

  • How do I load my multiple gigabyte data file?
  • Algorithms crash when I try to run my dataset; what should I do?
  • Can you help me with out-of-memory errors?

In this post, I want to offer some common suggestions you may want to consider.

7 Ways to Handle Large Data Files for Machine Learning

7 Ways to Handle Large Data Files for Machine Learning
Photo by Gareth Thompson, some rights reserved.

1. Allocate More Memory

Some machine learning tools or libraries may be limited by a default memory configuration.

Check if you can re-configure your tool or library to allocate more memory.

A good example is Weka, where you can increase the memory as a parameter when starting the application.

2. Work with a Smaller Sample

Are you sure you need to work with all of the data?

Take a random sample of your data, such as the first 1,000 or 100,000 rows.
To finish reading, please visit source site