Honestly, the thing that will probably kill LLMs the hardest is someone writing a small language model that fits in JavaScript in a browser and hits comparable benchmarks.

Why bother with all those GPUs and energy usage if your Raspberri Pi could get comparable results?

Is this possible? I dunno. I'm not specialized in this.

But if I wanted to fuck the GenAI bubble over and had the relevant background experience? This is what I'd explore.

There's a lot of interesting discussion in the replies.

My idea is to fight fire with fire. Not everyone has the stomach for that. That's okay. You don't gotta use those tools.

@soatok about a year ago, a bunch of friends were trying to do this. Various Chinese companies and universities had just released a bunch of relatively efficient models, and my friends ran them on phones and pi's with a wait of 1-5 minutes for each response. Imo, that's too long to be really competitive, but it's real close. Idk where things are now, but Id guess that it's only a matter of time until someone makes a decent model that can run entirely on the gpu of a phone nice and fast.
@TommyTorty10 @soatok Chinese models are nearly there. DeepSeek R1 and Kimi K2 both being able to run on not much more than a Pi to get extremely decent results for the power needed.

@nicfitzgerald @TommyTorty10 @soatok I think these are not the large models themselves, but "distilled" models trained using the large model as a guide.

Still very impressive.

@kakurady @TommyTorty10 @soatok I think they're both originally full models but they released the distilled versions of them.
@TommyTorty10 @soatok
If RAM just didn't shoot up in price, more and more phones would have had more and more RAM to run more and more capable models.
Along with ML accelerators in silicon.
@soatok makes me think, earlier computers were as obtuse, crude and walled as LLMs are nowadays. But computers now can fit in the person's pocket while being thousands of times as powerful as an ENIAC. So looking at the past, a lightweight, locally-run LLM as powerful as the ones we have access to nowadays sounds perfectly reasonable to me.
@soatok If you want it just to be able to use language, sure. But they want a vastly overfitted model that lossily compresses the volume of human writing and can spit back out obfuscated plagiarism of arbitrary parts.

@dalias One model per language.

Want it to generate C? Download the C model.

Want it to write bad poetry? Download the Vogon I mean English model.

@soatok Right but that's not all they want. They want it to generate obfuscated plagiarism of poetry. They want it to generate "copyright-free" copies of arbitrary FOSS programs, songs, etc. This inherently requires the largeness of the model because the plagiarism is buried in the overfitting.
@soatok If you had to give it the things you wanted copied as explicit input, the plagiarism and copyright infringement would be obvious to users and courts. Making it gigantic ambient state obfuscated in the model is how they get away with it.

@dalias @soatok we agree that this is a thing these companies want, in the present day, now that they've seen the potential for theft-at-scale

we don't think it's the line of reasoning that brought us here

@dalias @soatok we think the original motivation was the usual large-company thing of starting from the conclusion they want to be true, then pretending like it is.

it would have gone like this: for large companies to dominate this market, there has to be something they can do that small companies can't. what is that? spend more money on training it.

@dalias @soatok our main reason for thinking about this is that our friends at DAIR who were part of Google's ML Fairness team have spoken publicly about the company's (lack of) reasoning for increasing model sizes
@ireneista @soatok I think it's correct that they didn't originally set out to make plagiarism machines, but they did set out to fake intelligence. Ability to use language doesn't do that. Their approach depended on assimilating a gigantic corpus of ambient "knowledge" in a form where it's not actually usable as knowledge but can be convincingly regurgitated in different permutations to look like the machine understands it.
@dalias @soatok yes, we agree with that. there was a hope at first that there might be other pieces of the architecture, but nobody has made that work, yet.
@lritter @soatok "They" being the AI enthusiasts or people who feel like they're getting something of value from "AI". They almost surely don't frame what they want to themselves or others in terms of the plagiarism, but it'd be useless to them without that outcome.

@dalias @soatok i don't mean to be the bearer of bad news but if these systems would only plagiarize it would certainly be easier to dismiss em. they also inter- and extrapolate, which, if a human had done it, would be counted as original work. of course that's not all that creative work is, and so it's quite limited.

overfitting is undesirable, because it turns a fuzzy database into a regular database - then it's not plagiarism, it's a straight up copyright violation.

@lritter @soatok Um, no, you're not being the "bearer of bad news". AI propagandism isn't news. An interpolation of existing works to cover up that it's essentially the same as someone else's work is plagiarism if a human does it too. This is why the early architects of free software always insisted on clean room reimplementations based on a specification someone else had worked out, not reading reverse engineered or leaked proprietary code then pretending they could forget it and write something equivalent.
@dalias @soatok i see you have your mind made up. but what i said isn't AI propagandism - just facts. i find the implication insulting, tbh. but you're in war mode, i get it. happy hunting.
@dalias @soatok what i want is definitely not that. what i want is something like siri, except not total ass.
@ariadne @soatok I mean they the AI enthusiasts.

@soatok I share this sentiment. Eventually models will be good enough to run on rpi.

Then we will be free to build on those. I don’t have the expertise or experience to create that model either, so i focus on building tools for that future.

If those corps can lay people off and still create shareholder value it will also work the other way around for Moms&Pops small business without hiring that many people from the get go.

@soatok I might have something that could take a shot at it - a v2 of something I first wrote in 2008...

@soatok this is a real ā€œwho would winā€ meme idea. And honestly, I don’t care for AI but in general I wish there was more interest in doing things efficiently instead of just throwing more and more resources at things.

I think about it every time I see posts about the average size of a webpage, or user testing on cheaper/older mobile devices.

@soatok AI is a cancer Killing one kind of cancer isn't gonna make much of a difference. Sure you can kill LLMs but that just stops text slop. Does not really stop video slop or audio slop
@snow You gotta make the whole cancer impossible to ever profit from so The Money will criminalize the whole thing

@soatok ollama allows u to run models locally, and others have run ai on phones, so i wouldnt be surprised if someone already has done this as well

but currently the quality of the responses suffers. am excited about the future tho because the best models today (claude, chatgpt, gemini) will hopefully be the same quality as a small local model in 10 years

@soatok

If anyone is thinking about smol models, one should go sniff around the Hugging Face Smol Models Research first. https://huggingface.co/HuggingFaceTB

Having said that though, I know some like the idea of a smol model, but then they get annoyed when the usability tradeoff is lack of general knowledge/needing to do tool use. Witness the reception of OpenAI's gpt-oss-20b for example.

HuggingFaceTB (Hugging Face Smol Models Research)

Exploring smol models (for text, vision and video) and high quality web and synthetic datasets

@soatok already on it :))))))
@ariadne Oh hell yeah

@soatok i should clarify: i am working on two models, one which takes an input and tries to spit out structured data

and another which takes structured data and outputs prose

@soatok in an assistant scenario, this allows the assistant to ascertain what the user wants, and then allows the assistant to report back with the results.
@soatok all of this will be AGPL because fuck Big Tech
@soatok fine-tuning and distilling LLMs into small models that can run in very limited environments is already a thing, but I'm pretty sure that building tiny language models for very specific purposes is still relatively underexplored.

I'm not into LLMs though. And I have barely idea whether this makes much sense.

@nullenvk @soatok It's totally buried now, but a couple years ago all the big players were shitting their pants because small language models were outperforming their LLMs (in general tasks, IIRC). Not even distilled; it just didn't take much. They all have their quiet small language models on the side now because of that. SLMs significantly outperform LLMs in their narrower areas, particularly in science, IIRC.

It's not a matter of whether it's possible; it's that pretending LLMs are the end-all-be-all is worth a LOT of money.

@soatok Maybe if they throw linear algebra at the wall for long enough, they'll find themselves the right basis. :P
@soatok I'm sure it's extremely possible to get a specialized LLM to run on a toaster at this point. But isn't the point of why these companies are throwing all the money and hardware at a problem for training purposes? To pour over the massive content of the existing live Internet to build the generalized LLMs values?

Albeit, perhaps a positives of specializing is that it's rather finite than the extremely undefined model of a generalized LLM. There's only so much to the Rust / C / COBOL language than there is a model that takes in any plaintext language and outputs a desired product to some proximity.

I doubt however, using the tools of the hyper capitalists is an effective way to dismantle the oversized house of these tech billionaires and their unsightly toys. It takes community and an outright denial that these billionaires should use community land for selfish grift and exploitation.
@soatok you deny these data centers that drink our clean water and pollute our air, and then eventually these tech billionaires will have to justify why they haven't built or scaled their company as much for all the investment they took in.

It should then fall like a house of cards. Because the cost of making things happen, when met with resistance, gets to become a non-trivial cost. But that requires all of us to be vigilant as to what's getting built near us and having a voice to dissent. We have to be that wrench in the gears that turn against community.
@soatok For anyone who wants to run ai model on their phone. This might be usefull..
https://github.com/google-ai-edge/gallery
GitHub - google-ai-edge/gallery: A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. - google-ai-edge/gallery

GitHub

@soatok You know: It strikes me that a lot of what these LLMs are being used for is essentially summarizing text. Sure, not entirely.

It turns out that we were already looking into how to do this well before ChatGPT, but it never took off.

Those models had some procedural guard rails to ensure it was at least somewhat accurately shortening the text, judging by word frequencies. Sure this wasn't actually summarizing, but at least it strove not to lie!

@soatok apparently you can even run an LLM in a font fuglede.github.io/llama.ttf/

(requires HarfBuzz+WASM; haven't tried it myself)
llama.ttf

llama.ttf is a font file which is also a large language model and an inference engine for that model.

@soatok
I think it's only a matter of when. DeepSeek already managed to make a huge efficiency leap and with current the pace of development, I wouldn't be surprised if someone pulled that off in 2026.
@soatok i think we're a bunch of innovations away from this goal, it's not implausible. but that's just inference. the part that still sucks time and energy is training. cutting datacenter dependence here and democratizing model construction would truly spell the end of their gatekeeping. here though, i have no ideas.
@soatok this is what deepseek almost did, sadly only almost.

@soatok Maybe, but I wouldn’t bet on that. They would try and extrapolate whatever method you used to make it run on a Raspberry Pi to make it scale up to data-center level again. If it can’t run better this way because of diminishing returns or whatever, it has to run more often instead. The large energy-cugging data centers are the point, not the performance of the AI. Same as more energy efficient LEDs didn’t lead to less power consumption but to more lamps in use.

Maybe that won’t happen here, but like I said, I’m not sure.

@muellermeier Right. This would need to be something that "satisfies" while obviating datacenters to be a death knell.
@soatok With more and more new personal compute platforms featuring an NPU, local SLM should absolutely be the outcome strived for. Local processing of streaming text-to-speech voices. Local uncensored image descriptions. Something useful like that a user might actually want a system with an NPU for. But that doesn't sell token subscriptions and gatekeep access..
@soatok the thing that will kill LLMs the hardest is the fact that u need to charge like 1k a month to make it profitable after investors stop dumping money in and who tf would pay that much 
@soatok i think if you were able to do this, you might have also come up with the best compression algorithm ever designed
@cameron Heh, it wouldn't be lossless though
@soatok the algo would get the vibe of it right. šŸ˜‰ā€‹
@soatok I don't think many people would be able to tell the difference between Cleverbot & chatgpt. I'm sure they'll get away with something as light as a Markov chain.
killing it as in making it ubiquitous?!?