Processing large JSON files in Python without running out of memory

If you need to process a large JSON file in Python, it’s very easy to run out of memory.
Even if the raw data fits in memory, the Python representation can increase memory usage even more.

And that means either slow processing, as your program swaps to disk, or crashing when you run out of memory.

One common solution is streaming parsing, aka lazy parsing, iterative parsing, or chunked processing.
Let’s see how you can apply this technique to JSON processing.

The problem: Python’s memory-inefficient JSON loading

For illustrative purposes, we’ll be using this JSON file, large enough at 24MB that it has a noticeable memory impact when loaded.
It encodes a list of JSON objects (i.e. dictionaries), which look to be GitHub

 

 

 

To finish reading, please visit source site