Two kinds of threads pools, and why you need both
When you’re doing large scale data processing with Python, threads are a good way to achieve parallelism.
This is especially true if you’re doing numeric processing, where the global interpreter lock (GIL) is typically not an issue.
And if you’re using threading, thread pools are a good way to make sure you don’t use too many resources.
But how many threads should your thread pool have?
And do you need just one thread pool, or more than one?
In this article we’ll see that for data processing batch jobs:
- There are two kinds of thread pools, each for different use cases.
- Each kind requires a different configuration.
- You might need both.
Setting the scene: performance architecture is situational
There is no such thing as performance in