Pandas vectorization: faster code, slower code, bloated memory
When you’re processing data with Pandas, so-called “vectorized” operations can significantly speed up your code.
Or at least, that’s the theory.
In practice, in some situations Pandas vectorized operations can actually make your code slower, or at least no faster.
And they can also significantly increase memory usage.
Let’s dig in and see what vectorization means in Pandas, when and why it helps, and when it’s harmful.
Vectorization: what it means, and how it speeds up your code
Vectorization can mean different things, as discussed in a more in-depth article on what vectorization means in Python.
For our purposes there are two relevant meanings:
- Batch API: An API that can process multiple items of data at once.
- A native-code loop: In addition to exposing a batch