Spent a good chunk of my week-end implementing vectorized versions of log2 (max rel err: 3.090759E-06f) and exp2 (max rel err: 1.8041857E-07f) with support for NaN (optional)
I have used fpminimax from sollya to approximate to a polynomial and results are pretty good: Almost 40x faster for log2, 6x faster for exp2!
.NET 7 JIT is doing a pretty good job at optimizing the function, and the super nice thing is that it is the same C# code for ARM64, SSE2, AVX2 with System.Numerics.Vector<T>! 😎

