Loading SQL data into Pandas without running out of memory
You have some data in a relational database, and you want to process it with Pandas.
So you use Pandas’ handy read_sql()
API to get a DataFrame—and promptly run out of memory.
The problem: you’re loading all the data into memory at once.
If you have enough rows in the SQL query’s results, it simply won’t fit in RAM.
Pandas does have a batching option for read_sql()
, which can reduce memory usage, but it’s still not perfect: it also loads all the data into memory at once!
So how do you process larger-than-memory queries with Pandas?
Let’s find out.
Iteration #1: Just load the data
As a starting point, let’s just look at the naive—but often sufficient—method of loading data from a SQL database into