RE: https://social.treehouse.systems/@ariadne/116213132813239860

Read what Ariadne is writing about LLMs. This all tracks with my intuition, that OpenAI et al are a big grift.

You categorically do NOT need millions or billions to train a useful LLM that can communicate in human language. LLMs are good at language, it's in the name!

The reason these companies are burning massive amounts of money and using increasingly massive models is they've taken "look, this tech makes for a cute chatbot that can do useful stuff" and turned it into "if we make it bigger it'll be SMARTER!"

And the thing is, that's true... to a point. When you stop treating the LLM as a language model and start trying to turn them into an all-knowing entity that has memorized the entirety of human knowledge and can do anything you prompt it for all with the same model (or a few collaborating models), you quickly hit diminishing returns. And you end up with a thing that's kind of smart (not really) and kind of knows everything (not really) and convinces everyone to throw insane amounts of money at you because you're fundamentally using the technology for something it wasn't intended for.

The way we fight back is with small home-grown "LLMs" (SLMs?) that run on a MacBook and train on a few GPUs and training/fine tuning them for specific purposes.

The whole AIBro approach of just using prompting and in-context learning with a single all-powerful model is just patently absurd.

@lina If you'll let me split hairs a bit, I think there's a lot of potential in community built *base* models that are maybe 10x to 100x the size of what you described. These would be fine-tunable by end users at home for a variety of purposes, but still be fairly powerful. We don't need to be training from scratch all the time.

@dvshkn Oh absolutely, that's my personal plan (build off of a niche ethically trained base model, they exist).

The core difference is you only need a large enough model to have a fairly good model of language and just enough world knowledge to function well within the language, not something that can accurately follow any arbitrary instructions in a prompt. That second part is what you fine tune for.

@lina Very cool, I think a lot of people are more or less on the same page regarding this stuff. Interested in seeing how you approach it!

@lina @dvshkn
I kind of want a small language model that only translates my rambling into some simple, standardized form of ordered steps.

Could be actual code; could be text adventure-like fixed pattern of simple language steps in JSON. That could be ingested by other tools (that won't need to be neural networks at all) and acted on.