The fastest way to read a CSV in Pandas
You have a large CSV, you’re going to be reading it in to Pandas—but every time you load it, you have to wait for the CSV to load.
And that slows down your development feedback loop, and might meaningfully slows down your production processing.
But it’s faster to read the data in faster.
Let’s see how.
In this article we’ll cover:
- Pandas’ default CSV reading.
- The faster, more parallel CSV reader introduced in v1.4.
- A different approach that can make things even faster.
Reading a CSV, the default way
I happened to have a 850MB CSV lying around with the local transit authority’s bus delay data, as one does.
Here’s the default way of loading it with Pandas: