Using portable SIMD in stable Rust

In a previous post we saw that you can speed up code significantly on a single core using SIMD: Single Instruction Multiple Data.
These specialized CPU instructions allow you to, for example, add 4 values at once with a single instruction, instead of the usual one value at a time.
The performance improvement you get compounds with multi-core parallelism: you can benefit from both SIMD and threading at the same time.
Unfortunately, SIMD instructions are specific both to CPU architecture and CPU model.
Thus ARM CPUs as used on modern Macs have different SIMD instructions than x86-64 CPUs.
And even if you only care about x86-64, different models support different instructions; the i7-12700K CPU in my current computer doesn’t support AVX-512 SIMD, for example.
One way to