Speeding up Cython with SIMD
Cython allows you to write compiled extensions for Python, by translating Python-y code to C or C++.
Often you’ll use it to speed up your software, and it’s especially useful for implementing small data science or scientific computing algorithms.
But what happens when Cython is too slow?
Often there’s still speed improvements you can do.
In a previous article we focused on examples of optimizing your code to take advantage of things like instruction-level parallelism.
In this article, we’ll focus on another CPU feature, Single Instruction Multiple Data or SIMD, specifically in the context of Cython.
As well see, in some situations using SIMD can happen with only minimal changes to your code.
Parts of this article are excerpted from a book I’m working on