Got sucked into some random FFT optimization this weekend. LiFFT, my little header only FFT lib was never meant to be blazingly fast, vs "good enough for my little audio experiments", BUT...

After some SIMD optimization, I have a very similar performance to Pocket FFT at basically all sizes, but PFFFT (Fantastic name!) and FFTW still have me beat by well over > 2x at certain sizes. On the one hand, both of those are way bigger than my dinky 200 SLoC toy lib, but how do they do it by so much!?

@slembcke writing FFTs that are actually fast is a real art form. I wrote the FFT library I use in my projects about a decade ago and tweaked it to get it fast on NEON and amd64. Heaps of fun.

@nickappleton It's such an attractive optimization problem to obsess about. You can go low level and do inline assembler, you can SIMD it, you can fiddle with it at the arithmetic level, at the symbolic level it has so many little trig-like identities you can apply, it's linear so you can go nuts reordering things...

I mean I say I'm surprised that PFFFT and FFTW are so much faster than mine, but also not really. >_< I mean my goal was something like "How fast can I make it in < 200 lines."