I resumed tinkering with the non-realtime #PhaseVocoder #TimeStretch project I started in 2020. Made the various algorithm options command line parameters instead of hard-coded.
It uses FFTW3 library to do the heavy lifting, but it's single-threaded. I did some maths following Wikipedia page [1], and managed to turn a 16N-size #FFT into 16 N-size FFTs, some multiplication by "twiddle factors", and N 16-size FFTs. This lets me utilize all cores on my 16-thread CPU, by doing the individual parts of each stage with #ParallelComputing using #OpenMP.
I'm pretty sure my resulting FFT array is shuffled, but my phase vocoder algorithm is bin-order-agnostic so it doesn't matter.
Experimenting by recreating (parts of) 9 Beet Stretch [2], which AFAICT originally used granular synthesis (probably time domain?). The phase vocoder sounds a bit less spacy than the web stream I found.
[1] https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#The_radix-2_DIT_case
[2] https://en.wikipedia.org/wiki/9_Beet_Stretch