Honestly, the thing that will probably kill LLMs the hardest is someone writing a small language model that fits in JavaScript in a browser and hits comparable benchmarks.

Why bother with all those GPUs and energy usage if your Raspberri Pi could get comparable results?

Is this possible? I dunno. I'm not specialized in this.

But if I wanted to fuck the GenAI bubble over and had the relevant background experience? This is what I'd explore.

@soatok about a year ago, a bunch of friends were trying to do this. Various Chinese companies and universities had just released a bunch of relatively efficient models, and my friends ran them on phones and pi's with a wait of 1-5 minutes for each response. Imo, that's too long to be really competitive, but it's real close. Idk where things are now, but Id guess that it's only a matter of time until someone makes a decent model that can run entirely on the gpu of a phone nice and fast.
@TommyTorty10 @soatok Chinese models are nearly there. DeepSeek R1 and Kimi K2 both being able to run on not much more than a Pi to get extremely decent results for the power needed.

@nicfitzgerald @TommyTorty10 @soatok I think these are not the large models themselves, but "distilled" models trained using the large model as a guide.

Still very impressive.

@kakurady @TommyTorty10 @soatok I think they're both originally full models but they released the distilled versions of them.