Got sucked into some random FFT optimization this weekend. LiFFT, my little header only FFT lib was never meant to be blazingly fast, vs "good enough for my little audio experiments", BUT...
After some SIMD optimization, I have a very similar performance to Pocket FFT at basically all sizes, but PFFFT (Fantastic name!) and FFTW still have me beat by well over > 2x at certain sizes. On the one hand, both of those are way bigger than my dinky 200 SLoC toy lib, but how do they do it by so much!?