Mastodawn

Honestly, the thing that will probably kill LLMs the hardest is someone writing a small language model that fits in JavaScript in a browser and hits comparable benchmarks.

Why bother with all those GPUs and energy usage if your Raspberri Pi could get comparable results?

Show thread

Soatok Dreamseeker Dec 24

Is this possible? I dunno. I'm not specialized in this.

But if I wanted to fuck the GenAI bubble over and had the relevant background experience? This is what I'd explore.

Show thread

Soatok Dreamseeker Dec 24

There's a lot of interesting discussion in the replies.

My idea is to fight fire with fire. Not everyone has the stomach for that. That's okay. You don't gotta use those tools.

Show thread

TommyTorty10 Dec 24

@soatok about a year ago, a bunch of friends were trying to do this. Various Chinese companies and universities had just released a bunch of relatively efficient models, and my friends ran them on phones and pi's with a wait of 1-5 minutes for each response. Imo, that's too long to be really competitive, but it's real close. Idk where things are now, but Id guess that it's only a matter of time until someone makes a decent model that can run entirely on the gpu of a phone nice and fast.

Show thread

Lauren by RL Stan Account Dec 24

@TommyTorty10 @soatok Chinese models are nearly there. DeepSeek R1 and Kimi K2 both being able to run on not much more than a Pi to get extremely decent results for the power needed.

Show thread

Kakurady (🔜 NFC, FE)Dec 24

@nicfitzgerald @TommyTorty10 @soatok I think these are not the large models themselves, but "distilled" models trained using the large model as a guide.

Still very impressive.

Show thread

Lauren by RL Stan Account Dec 24

@kakurady @TommyTorty10 @soatok I think they're both originally full models but they released the distilled versions of them.

Show thread

ftg Dec 24

@TommyTorty10 @soatok
If RAM just didn't shoot up in price, more and more phones would have had more and more RAM to run more and more capable models.
Along with ML accelerators in silicon.

Show thread

Auster Dec 24

@soatok makes me think, earlier computers were as obtuse, crude and walled as LLMs are nowadays. But computers now can fit in the person's pocket while being thousands of times as powerful as an ENIAC. So looking at the past, a lightweight, locally-run LLM as powerful as the ones we have access to nowadays sounds perfectly reasonable to me.

Show thread

Cassandrich Dec 24

@soatok If you want it just to be able to use language, sure. But they want a vastly overfitted model that lossily compresses the volume of human writing and can spit back out obfuscated plagiarism of arbitrary parts.

Show thread

Soatok Dreamseeker Dec 24

@dalias One model per language.

Want it to generate C? Download the C model.

Want it to write bad poetry? Download the Vogon I mean English model.

Show thread

Cassandrich Dec 24

@soatok Right but that's not all they want. They want it to generate obfuscated plagiarism of poetry. They want it to generate "copyright-free" copies of arbitrary FOSS programs, songs, etc. This inherently requires the largeness of the model because the plagiarism is buried in the overfitting.

Show thread

Cassandrich Dec 24

@soatok If you had to give it the things you wanted copied as explicit input, the plagiarism and copyright infringement would be obvious to users and courts. Making it gigantic ambient state obfuscated in the model is how they get away with it.

Show thread

Irenes (many)Dec 24

@dalias @soatok we agree that this is a thing these companies want, in the present day, now that they've seen the potential for theft-at-scale

we don't think it's the line of reasoning that brought us here

Show thread

Irenes (many)Dec 24

@dalias @soatok we think the original motivation was the usual large-company thing of starting from the conclusion they want to be true, then pretending like it is.

it would have gone like this: for large companies to dominate this market, there has to be something they can do that small companies can't. what is that? spend more money on training it.

Show thread

Irenes (many)Dec 24

@dalias @soatok our main reason for thinking about this is that our friends at DAIR who were part of Google's ML Fairness team have spoken publicly about the company's (lack of) reasoning for increasing model sizes

Show thread

Cassandrich Dec 24

@ireneista @soatok I think it's correct that they didn't originally set out to make plagiarism machines, but they did set out to fake intelligence. Ability to use language doesn't do that. Their approach depended on assimilating a gigantic corpus of ambient "knowledge" in a form where it's not actually usable as knowledge but can be convincingly regurgitated in different permutations to look like the machine understands it.

Show thread

Irenes (many)Dec 24

@dalias @soatok yes, we agree with that. there was a hope at first that there might be other pieces of the architecture, but nobody has made that work, yet.

Show thread

LR Dec 24

@dalias @soatok "they"

Show thread

Cassandrich Dec 24

@lritter @soatok "They" being the AI enthusiasts or people who feel like they're getting something of value from "AI". They almost surely don't frame what they want to themselves or others in terms of the plagiarism, but it'd be useless to them without that outcome.

Show thread

LR Dec 24

@dalias @soatok i don't mean to be the bearer of bad news but if these systems would only plagiarize it would certainly be easier to dismiss em. they also inter- and extrapolate, which, if a human had done it, would be counted as original work. of course that's not all that creative work is, and so it's quite limited.

overfitting is undesirable, because it turns a fuzzy database into a regular database - then it's not plagiarism, it's a straight up copyright violation.

Show thread

Cassandrich Dec 24

@lritter @soatok Um, no, you're not being the "bearer of bad news". AI propagandism isn't news. An interpolation of existing works to cover up that it's essentially the same as someone else's work is plagiarism if a human does it too. This is why the early architects of free software always insisted on clean room reimplementations based on a specification someone else had worked out, not reading reverse engineered or leaked proprietary code then pretending they could forget it and write something equivalent.

Show thread

LR Dec 24

@dalias @soatok i see you have your mind made up. but what i said isn't AI propagandism - just facts. i find the implication insulting, tbh. but you're in war mode, i get it. happy hunting.

Show thread

Ariadne Conill 🐰

Dec 24

@dalias @soatok what i want is definitely not that. what i want is something like siri, except not total ass.

Show thread

Cassandrich Dec 24

@ariadne @soatok I mean they the AI enthusiasts.

Show thread

Zeppelin Blanc Dec 24

@soatok I share this sentiment. Eventually models will be good enough to run on rpi.

Then we will be free to build on those. I don’t have the expertise or experience to create that model either, so i focus on building tools for that future.

If those corps can lay people off and still create shareholder value it will also work the other way around for Moms&Pops small business without hiring that many people from the get go.

Show thread

Raven Luni Dec 24

@soatok I might have something that could take a shot at it - a v2 of something I first wrote in 2008...

Show thread

Sluether

Dec 24

@soatok this is a real “who would win” meme idea. And honestly, I don’t care for AI but in general I wish there was more interest in doing things efficiently instead of just throwing more and more resources at things.

I think about it every time I see posts about the average size of a webpage, or user testing on cheaper/older mobile devices.

Show thread

Snow Dec 24

@soatok AI is a cancer Killing one kind of cancer isn't gonna make much of a difference. Sure you can kill LLMs but that just stops text slop. Does not really stop video slop or audio slop

Show thread

Soatok Dreamseeker Dec 24

@snow You gotta make the whole cancer impossible to ever profit from so The Money will criminalize the whole thing

Show thread

LogicalErzor Dec 24

@soatok ollama allows u to run models locally, and others have run ai on phones, so i wouldnt be surprised if someone already has done this as well

but currently the quality of the responses suffers. am excited about the future tho because the best models today (claude, chatgpt, gemini) will hopefully be the same quality as a small local model in 10 years

Show thread

joy larkin 🌺✨Dec 24

@soatok

If anyone is thinking about smol models, one should go sniff around the Hugging Face Smol Models Research first. https://huggingface.co/HuggingFaceTB

Having said that though, I know some like the idea of a smol model, but then they get annoyed when the usability tradeoff is lack of general knowledge/needing to do tool use. Witness the reception of OpenAI's gpt-oss-20b for example.

HuggingFaceTB (Hugging Face Smol Models Research)

Exploring smol models (for text, vision and video) and high quality web and synthetic datasets

Show thread

Ariadne Conill 🐰

Dec 24

@soatok already on it :))))))

Show thread

Soatok Dreamseeker Dec 24

@ariadne Oh hell yeah

Show thread

Ariadne Conill 🐰

Dec 24

@soatok i should clarify: i am working on two models, one which takes an input and tries to spit out structured data

and another which takes structured data and outputs prose

Show thread

Ariadne Conill 🐰

Dec 24

@soatok in an assistant scenario, this allows the assistant to ascertain what the user wants, and then allows the assistant to report back with the results.

Show thread

Ariadne Conill 🐰

Dec 24

@soatok all of this will be AGPL because fuck Big Tech

Show thread

kasia

Dec 24

@soatok fine-tuning and distilling LLMs into small models that can run in very limited environments is already a thing, but I'm pretty sure that building tiny language models for very specific purposes is still relatively underexplored.

I'm not into LLMs though. And I have barely idea whether this makes much sense.

Show thread

bluestarultor Dec 24

@nullenvk @soatok It's totally buried now, but a couple years ago all the big players were shitting their pants because small language models were outperforming their LLMs (in general tasks, IIRC). Not even distilled; it just didn't take much. They all have their quiet small language models on the side now because of that. SLMs significantly outperform LLMs in their narrower areas, particularly in science, IIRC.

It's not a matter of whether it's possible; it's that pretending LLMs are the end-all-be-all is worth a LOT of money.

Show thread

Bersl Dec 24

@soatok Maybe if they throw linear algebra at the wall for long enough, they'll find themselves the right basis. :P

Show thread

ilobmirt Dec 24

@soatok I'm sure it's extremely possible to get a specialized LLM to run on a toaster at this point. But isn't the point of why these companies are throwing all the money and hardware at a problem for training purposes? To pour over the massive content of the existing live Internet to build the generalized LLMs values?

Albeit, perhaps a positives of specializing is that it's rather finite than the extremely undefined model of a generalized LLM. There's only so much to the Rust / C / COBOL language than there is a model that takes in any plaintext language and outputs a desired product to some proximity.

I doubt however, using the tools of the hyper capitalists is an effective way to dismantle the oversized house of these tech billionaires and their unsightly toys. It takes community and an outright denial that these billionaires should use community land for selfish grift and exploitation.

Show thread

ilobmirt Dec 24

@soatok you deny these data centers that drink our clean water and pollute our air, and then eventually these tech billionaires will have to justify why they haven't built or scaled their company as much for all the investment they took in.

It should then fall like a house of cards. Because the cost of making things happen, when met with resistance, gets to become a non-trivial cost. But that requires all of us to be vigilant as to what's getting built near us and having a voice to dissent. We have to be that wrench in the gears that turn against community.

Show thread

Kye.br

Dec 24

@soatok For anyone who wants to run ai model on their phone. This might be usefull..
https://github.com/google-ai-edge/gallery

GitHub - google-ai-edge/gallery: A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. - google-ai-edge/gallery

GitHub

Show thread

alcinnz Dec 24

@soatok You know: It strikes me that a lot of what these LLMs are being used for is essentially summarizing text. Sure, not entirely.

It turns out that we were already looking into how to do this well before ChatGPT, but it never took off.

Those models had some procedural guard rails to ensure it was at least somewhat accurately shortening the text, judging by word frequencies. Sure this wasn't actually summarizing, but at least it strove not to lie!

Show thread

jan Wilejan Dec 24

@soatok apparently you can even run an LLM in a font fuglede.github.io/llama.ttf/

(requires HarfBuzz+WASM; haven't tried it myself)

llama.ttf

llama.ttf is a font file which is also a large language model and an inference engine for that model.

Show thread

RawiWoof Dec 24

@soatok
I think it's only a matter of when. DeepSeek already managed to make a huge efficiency leap and with current the pace of development, I wouldn't be surprised if someone pulled that off in 2026.

Show thread

LR Dec 24

@soatok i think we're a bunch of innovations away from this goal, it's not implausible. but that's just inference. the part that still sucks time and energy is training. cutting datacenter dependence here and democratizing model construction would truly spell the end of their gatekeeping. here though, i have no ideas.

Show thread

Nafeon the Bear Dec 24

@soatok this is what deepseek almost did, sadly only almost.

Show thread

Müllermeier Dec 24

@soatok Maybe, but I wouldn’t bet on that. They would try and extrapolate whatever method you used to make it run on a Raspberry Pi to make it scale up to data-center level again. If it can’t run better this way because of diminishing returns or whatever, it has to run more often instead. The large energy-cugging data centers are the point, not the performance of the AI. Same as more energy efficient LEDs didn’t lead to less power consumption but to more lamps in use.

Maybe that won’t happen here, but like I said, I’m not sure.

Show thread

Soatok Dreamseeker Dec 24

@muellermeier Right. This would need to be something that "satisfies" while obviating datacenters to be a death knell.

Show thread

Yolfen🐾 🦯Dec 24

@soatok With more and more new personal compute platforms featuring an NPU, local SLM should absolutely be the outcome strived for. Local processing of streaming text-to-speech voices. Local uncensored image descriptions. Something useful like that a user might actually want a system with an NPU for. But that doesn't sell token subscriptions and gatekeep access..

Show thread

Spring Jo 🥚

🍀Dec 24

@soatok WASM, i'd suggest

Show thread

Facade Dec 24

@soatok the thing that will kill LLMs the hardest is the fact that u need to charge like 1k a month to make it profitable after investors stop dumping money in and who tf would pay that much

Show thread

cameron Dec 24

@soatok i think if you were able to do this, you might have also come up with the best compression algorithm ever designed

Show thread

Soatok Dreamseeker Dec 24

@cameron Heh, it wouldn't be lossless though

Show thread

cameron Dec 24

@soatok the algo would get the vibe of it right. 😉

Show thread

Luna Lactea Dec 25

@soatok I don't think many people would be able to tell the difference between Cleverbot & chatgpt. I'm sure they'll get away with something as light as a Markov chain.

Show thread

Alexandre Oliva Dec 25

killing it as in making it ubiquitous?!?