Measuring the memory usage of a Pandas DataFrame
How much memory are your Pandas DataFrame or Series using?
Pandas provides an API for measuring this information, but a variety of implementation details means the results can be confusing or misleading.
Consider the following example:
>>> import pandas as pd
>>> series = pd.Series(["abcdefhjiklmnopqrstuvwxyz" * 10
... for i in range(1_000_000)])
>>> series.memory_usage()
8000128
>>> series.memory_usage(deep=True)
307000128
Which is correct, is memory usage 8MB or 300MB?
Neither!
In this special case, it’s actually 67MB, at least with the default Python interpreter.