Honestly, the thing that will probably kill LLMs the hardest is someone writing a small language model that fits in JavaScript in a browser and hits comparable benchmarks.

Why bother with all those GPUs and energy usage if your Raspberri Pi could get comparable results?

@soatok ollama allows u to run models locally, and others have run ai on phones, so i wouldnt be surprised if someone already has done this as well

but currently the quality of the responses suffers. am excited about the future tho because the best models today (claude, chatgpt, gemini) will hopefully be the same quality as a small local model in 10 years