Don’t bother trying to estimate Pandas memory usage
You have a file with data you want to process with Pandas, and you want to make sure you won’t run out of memory.
How do you estimate memory usage given the file size?
At times you may see estimates like these:
- “Have 5 to 10 times as much RAM as the size of your dataset”, or
- “several times the size of your dataset”, or
- 2×-3× the size of the dataset.
All of these estimates can both under- and over-estimate memory usage, depending on the situation.
In fact, I will go so far as to say that estimating memory usage is just not worth doing.
In particular, this article will:
- Demonstrate the very broad range of memory usage you will see just from loading the data, before any