The limits of Python vectorization as a performance technique
Vectorization in Python, as implemented by NumPy, can give you faster operations by using fast, low-level code to operate on bulk data.
And Pandas builds on NumPy to provide similarly fast functionality.
But vectorization isn’t a magic bullet that will solve all your problems: sometimes it will come at the cost of higher memory usage, sometimes the operation you need isn’t supported, and sometimes it’s just not relevant.
For each problem, there are alternative solutions that can address the problem.
In this article, we’ll:
- Recap why vectorization is useful.
- Go over the various limits and problems with vectorization.
- Consider some solutions to each problem: PyPy, Numba, and compiled extensions.
Vectorization has multiple meanings, we’re focusing on fast, low-level loops
To summarize a more detailed explanation of