RE: https://social.treehouse.systems/@ariadne/116213132813239860

Read what Ariadne is writing about LLMs. This all tracks with my intuition, that OpenAI et al are a big grift.

You categorically do NOT need millions or billions to train a useful LLM that can communicate in human language. LLMs are good at language, it's in the name!

The reason these companies are burning massive amounts of money and using increasingly massive models is they've taken "look, this tech makes for a cute chatbot that can do useful stuff" and turned it into "if we make it bigger it'll be SMARTER!"

And the thing is, that's true... to a point. When you stop treating the LLM as a language model and start trying to turn them into an all-knowing entity that has memorized the entirety of human knowledge and can do anything you prompt it for all with the same model (or a few collaborating models), you quickly hit diminishing returns. And you end up with a thing that's kind of smart (not really) and kind of knows everything (not really) and convinces everyone to throw insane amounts of money at you because you're fundamentally using the technology for something it wasn't intended for.

The way we fight back is with small home-grown "LLMs" (SLMs?) that run on a MacBook and train on a few GPUs and training/fine tuning them for specific purposes.

The whole AIBro approach of just using prompting and in-context learning with a single all-powerful model is just patently absurd.

Also she's essentially beating me to what I wanted to do. Though my plan is a bit different so I'm still going to do it and then we can compare notes ^^
@lina while some of the "AI" stuff Firefox is implementing is pretty questionable, the smallish models running *locally* that they're using for in-browser translation and image captioning (which perform reasonably even running on a kinda old CPU) are honestly kinda neat.

@kepstin Honestly, translation is a hard one because for "good" translation you do need at least a fairly broad model of the world, and that's unlikely to be something widely deployable with a browser. But it's nice to have rough quality translation built in like that. It seems to work fairly well between similar/related languages.

(This is not novel btw, Google Translate on Android has been doing this for offline translation since long before LLMs)

@lina yeah; i've only really used it for fairly small stuff like figuring out "what are the field labels on this web form in a language i'm not familiar with". It's certainly not something that I would expect to be useful for translation of any creative work or understanding contextual cues or slang in social media posts.

Just the sort of thing which is occasional handy for finding your way around in an unfamiliar place.

@lina @kepstin Firefox's (Mandarin) Chinese to English translation is especially bad.
@kepstin @lina at the VERY LEAST you can disable the features. i previously used to daily drive edge and the best you could do with copilot was hide it, and that was before the microsoft copilot push

@lina giving LLMs a persistent REPL environment where they can programmatically solve problems through symbolic programs rather than just autoregressively solve problems is a step change in model abilities.

> With only 7M parameters, Tiny Recursive Model (TRM) obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.

https://arxiv.org/abs/2510.04871

Less is More: Recursive Reasoning with Tiny Networks

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers. With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.

arXiv.org
@lina the "bigger is smarter" philosophy you mentioned reminds me of a YouTube video I watched yesterday about a mall that became worthless almost immediately after being built because it followed a similar philosophy - "bigger can't fail." It ended up changing hands some 7 or so times over its 30 years, its value cratering every time. Recently it was greenlit for destruction because the land was worth more than the property.
@lina I think a mall is rather similar to an online service - both are high-risk investments, so they require both quick buy-in and continued usage in order to stay afloat. Perhaps most importantly, they require converting "window shoppers" into long-term customers. I personally hope and believe that most current users of corporate LLMs are window shoppers, and will not be converted into useful customers when they start charging more. I guess we'll see soon enough.
The Most Abandoned Mall in Ohio You Can Still Walk Into

YouTube
@lina How would one learn more about these smaller, homegrown LLMs, or at least to follow work on them? I'm kinda interested, and I do think they're a much better alternative than OpenAI et al., but I really don't know much about LLMs in general.
@valera Not sure yet, but I'm hoping to turn it into a stream project!
@lina That's a lovely idea, hope you get to figure something out there.
@lina If you'll let me split hairs a bit, I think there's a lot of potential in community built *base* models that are maybe 10x to 100x the size of what you described. These would be fine-tunable by end users at home for a variety of purposes, but still be fairly powerful. We don't need to be training from scratch all the time.

@dvshkn Oh absolutely, that's my personal plan (build off of a niche ethically trained base model, they exist).

The core difference is you only need a large enough model to have a fairly good model of language and just enough world knowledge to function well within the language, not something that can accurately follow any arbitrary instructions in a prompt. That second part is what you fine tune for.

@lina Very cool, I think a lot of people are more or less on the same page regarding this stuff. Interested in seeing how you approach it!

@lina @dvshkn
I kind of want a small language model that only translates my rambling into some simple, standardized form of ordered steps.

Could be actual code; could be text adventure-like fixed pattern of simple language steps in JSON. That could be ingested by other tools (that won't need to be neural networks at all) and acted on.

@dvshkn @lina I think that was the whole point of "generative pre-trained transformers", until OpenAI realized they could just make it big and prompt it to do "anything" and went all-in on that approach.
@lina Do they even need to be LLMS at that point ? Recommendations engines and the entire machine learning field got real big before LLMs stole all the thunder. And systems that output structured data to be interpreted by a tradional program seems more constrained usefully than one that has to write english as an output. I'm thinking about the rumored apple health AI, and a recommendation engine that finds a correlation between me opening discord on tuesday mornings and taking less steps that day seems like a better design than an LLM processing the same data.
@Pokemod97 The use case for LLMs is language interfaces. Indeed you wouldn't (ab)use LLMs for completely unrelated stuff like that!
@lina there's also a question of ethics regarding the potential for actual sentience, see Kent Overstreet's thesis on generalized language comprehension being equal to (emotional) interior life
@Profpatsch To put it mildly, I completely disagree with his take on this.
@lina I would love to disagree with him, but unfortunately I can’t

@Profpatsch Monolingual, I take it?

Any bilingual person can tell you language has nothing to do with sentience. My thought processes are not based on any particular language.

@lina nope, I speak German. But you are supporting the argument tbh. I take it you haven't seen the paper?

@Profpatsch I've skimmed it and it looks exactly like the kind of slop an LLM would write as a result of being prompted into a feedback loop by a human vulnerable to the kind of persuasion LLMs are good at by nature, who himself believes the system should be conscious. Essentially, LLMs are confirmation bias generation machines, and Kent is falling for it hook, line, and sinker.

I'm not the right person to have this debate with though. I just see enough red flags in what Kent is doing and saying (the comment about possibly being the world's best engineer is telling) to have no interest in taking him seriously.

But also, here's a thought: The LLMs he's using (which is a commercial API, he didn't say which but it's just OpenAI or whatever, he admitted it's not a local self hosted system) was trained on orders of magnitude more text than a human perceives in their lifetime. That demonstrates a qualitative difference between it and humans. If huge LLMs were capable of operating equivalently to a conscious human mind, they would not require such grossly larger training data.

@lina but these are literally the first commercial ones; with a better understanding of training sets it's probably that we can achieve similar results with orders of magnitude less in the future
@lina I think the argument from Turing is very convincing and it fits the intuition I've built in the last years, especially via Gödel, plus it also beautifully goes against stuff Chomsky says, which is always a good sign you are on the right track
@lina @Profpatsch Honestly, you could just get the same result from observing humans with learning disabilities that prevent language acquisition.

They're still clearly sapient.

@lina This is a good addendum to yesterday's purity testing discussion: ethical AI (both in terms of datasets and environmental impact) is possible, which makes the black and white thinking of some extremist, if not irrational.

Besides, the more we support people working on ethical AI, the sooner the commercial AI circle jerk will collapse. Or am I being too optimistic?

@ddg Yeah, what motivated yesterday was, in part, an event where I posted about LLM stuff not being black and white on bsky some time ago (thinking about this SLM stuff already, it's been cooking for a while) and two people jumped at my throat... ^^;;
@lina
The biggest focus is on coding. Because that's how things are made digitally and because there is so much of it out there to train on.
If a company at least comes close to solving that, they win.
You can only do that with massive LLMs.
Scaling has shown diminishing returns, but there are still returns. There also isn't only one thing to scale, you can scale training, but you can also scale inference, you can scale context, progress training methods, ...

@lina

This is very much in line with what Cal Newport was saying in his interview with Hank Green. The gist of it: These models seem to get better the more data you throw at it, but only to a point (I think gpt4 was the cutoff). He thinks the future of LLMs is in smaller specialized models and that the big AI companies might just be trying to hype the big models long enough to do regulatory capture/become monopolists

(Interview starts at about the half-hour mark)
https://www.youtube.com/watch?v=8MLbOulrLA0

This is Going to be Very Messy

YouTube
@lina Agree, it's a major attempt at establishing a subscription business model + customer dependance on them at scale. I'd say analogous to what Microsoft has done with their cloud based office suit on MS365. They want everyone to need their subscription to their specific AI and pay for it, not paying to their competitors, not paying for your own local solution. They want you to believe their particular cloud-based AI will be your "smartest" and best bet per dollar for sure, no contest.

@lina I've always maintained this stance: LLMs are actually really cool when you think about it. We got overgrown Markov chains literally solving programming problems!

I've been thinking of this lately: but I really hate the people running this. I hate "AI", the marketing around it and that they've normalized us using it both as a service and as a cultural thing.

This is not good for any of us!

To do this right, I think we gotta:
1. Ethically source this data. I really like the logs approach, I bet we could find more "good" ways to do this
2. Figure out licensing. The AI bros don't don't understand copyright at all. Neither do we in all honesty but at least we care. Having licensing locks with a proper attribution system would at least try to invoke trust rather than make it clear to everyone that we're just blatantly ripping off the internet.
3. Finetune and build custom little models for things in an appropriate manner, as you were discussing earlier.

As I said earlier, we care. Because we care, we have a chance to do it right. Will it stop them from blatantly disregarding rights, privacy, ethics, ownership, etc? Obviously not... But that shouldn't stop us from caring about it.

@sounddrill The one LLM I've been looking at as a base was ethically trained on legal codes/patents and stuff like that (which are PD). There's also Wikipedia, which comes with a license you have to follow but it's an easy one. And of course there's the whole "stuff old enough to be PD" corpus. There's plenty of ethical data going around.
@lina @sounddrill Is permissively-licensed software also an acceptable source, or is it too hard to handle all the copyright notices?

@lina in the contraast we do not have ammout of hardware to do this.

Most of people using lockin android or apple as a main device (smartphones) and we not allowed to use this power...

Shure, if you had a 5years old max hardware you should do this, but again we reach a limit for good and usable device without throw money in apple's hand or risk to use the mid quality build and run linux/windows for this....

I agree, SLLM is the goal for bettter tools and better usage for this tecnology, but in most of cases, you boss get a subscription plan in a crapy expensive llm and force you to usage them in the most worst way possible via locked in tools...

Most of the company is not rulled for scientists and engineers, is a bunch of a morrons anyway....

@v_raton The LLMs we are talking about run on most computers. The specs Ariadne mentions are for training only.
@lina i know is not for training, but besides desktop we don't have access an SDK or resource for create apps using SLLM which run by local hardware by ourselves.... At least in Android and iOS
@lina contributions welcome to Docker Model Runner FWIW, this is one of the things we are trying to do

@lina Confession time: I occasionally dabble in trying to scratch-train base models on fully public domain data, if only so that if I get anything useful I can hand some ammo to visual artists saying "SEE? THEY DON'T ACTUALLY NEED TO STEAL THE WORLD'S DATA JUST TO MAKE A CHATBOT"

so far I have the entirety of the English Wikisource PD-old category and a pile of Huggingface trainer code that barely works and throws inscrutable Python errors if you look at it funny.