In my everlasting quest to beat big triangle I revisited an #OpenCL fft implementstion I made for uni in 2021. Couple of days in with the knowledge I have now I got it down already to 0.0033 seconds end to end latency (including host / device transfer) for a 1 million point complex FFT. This is on the low cost 5700 XT
Graphs and new code soon to follow, check https://github.com/Dantali0n/oCLFFT
