I ported my fork of #z80ai to #ZXSpectrum. Now it indeed ran at 3.5MHz (CP/M version on #ZXSpectrumNext must have been using 28Mhz). This simple convo took takes 4.5 minutes :)

(Optimizations are surely possible. I also pessimized it a bit by adding border colors just to not be bored waiting for a reply)

Grab the source and .tap file here: https://github.com/RCL/z80ai/tree/main/examples/tinychat/build_tap

It is of course very primitive. If you have #ZXSpectrumNext , you can run examples out of the box in the Next standard CP/M (you may need to build CHAT.COM but it's a breeze). At 3.5Mhz the token generation speed isn't great of course. #z80ai