Saving memory with Pandas 1.3’s new string dtype
When you’re loading many strings into Pandas, you’re going to use a lot of memory.
If you have only a limited number of strings, you can save memory with categoricals, but that’s only helpful in a limited number of situations.
With Pandas 1.3, there’s a new option that can save memory on large number of strings as well, simply by changing to a new column type.
Let’s see how.
Pandas’ different string dtypes
Every pandas.Series
, and every column in a pandas.DataFrame
, have a dtype: the type of object stored inside it.
By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object.
In Pandas 1.0, a new "string"
dtype was added,