Loading SQL data into Pandas without running out of memory

You have some data in a relational database, and you want to process it with Pandas.
So you use Pandas’ handy read_sql() API to get a DataFrame—and promptly run out of memory.

The problem: you’re loading all the data into memory at once.
If you have enough rows in the SQL query’s results, it simply won’t fit in RAM.

Pandas does have a batching option for read_sql(), which can reduce memory usage, but it’s still not perfect: it also loads all the data into memory at once!

So how do you process larger-than-memory queries with Pandas?
Let’s find out.

Iteration #1: Just load the data

As a starting point, let’s just look at the naive—but often sufficient—method of loading data from a SQL database into

 

 

 

To finish reading, please visit source site