Mastodawn

now that i am... writing my own agentic LLM framework thing... because if you're going to have a shitposting IRC bot you may as well go completely overkill, i have Opinions on the state of the world.

openclaw, especially, seems to be hot garbage, actually, because i was able to teach my LLM (which i trained from scratch on the highest quality artisanal IRC logs, 2003 to present, so i can assure you it is not a very good LLM) to use tools in the context of my own framework quite easily.

Show thread

Ariadne Conill 🐰

Mar 11

first of all, when i began i was quite skeptical on commercial AI.

this exercise has only made me more skeptical, for a few reasons:

first: you actually can hit the "good enough" point for text prediction with very little data. 80GB of low-quality (but ethically sourced from $HOME/logs) training data yielded a bot that can compose english and french prose reasonably well. if i additionally trained it on a creative commons licensed source like a wikipedia dump, it would probably be *way* more than enough. i don't have the compute power to do that though.

second: reasoning models seem to largely be "mixture of experts" which are just more LLMs bolted on to each other. there's some cool consensus stuff going on, but that's all there is. this could possibly be considered a form of "thinking" in the framing of minsky's society of mind, but i don't think there is enough here that i would want to invest in companies doing this long term.

third: from my own experiences teaching my LLM how to use tools, i can tell you that claude code and openai codex are just chatbots with a really well-written system prompt backed by a "mixture of experts" model. it is like that one scene where neo unlocks god mode in the matrix, i see how all this bullshit works now. (there is still a lot i do not know about the specifics, but i'm a person who works on the fuzzy side of things so it does not matter).

fourth: i built my own LLM with a threadripper, some IRC logs gathered from various hard drives, a $10k GPU, a look at the qwen3 training scripts (i have Opinions on py3-transformers) and few days of training. it is pretty capable of generating plausible text. what is the big intellectual property asset that OpenAI has that the little guys can't duplicate? if i can do it in my condo, a startup can certainly compete with OpenAI.

given these things, I really just don't understand how it is justifiable for all of this AI stuff to be some double-digit % of global GDP.

if anything, i just have stronger conviction in that now.

Show thread

mirth Mar 12

@ariadne Having studied up a bit myself I can fill in a few pieces. Reasoning models just have been trained to chatter on in some kind of preamble that is intended to be hidden or de-emphasized in the UI, possibly wrapped in tags like <reasoning>blah blah blah</reasoning>, followed by a shorter answer. Mixture of experts is an orthogonal idea to structure the models so predictions can be run using only a in order to use less compute. Both ideas make models hard to train for different reasons.

Show thread

Ariadne Conill 🐰

Mar 12

@mirth sure, but the "thinking" ones do some consensus stuff to ensure it doesn't go off course

Show thread

mirth Mar 12

@ariadne Not at prediction time, they do another stage of training that works a bit differently but the resulting model is structurally identical to the input model. I think you're very right about the lack of defensibility though, if you wanted to catch up with the leading labs in a year or two you could probably do it with around $200M and the charisma to recruit the people who know how to do this stuff.

Show thread

mirth Mar 12

@ariadne I should say by "catch up" I mean to get to parity, my impression is the model research is kind of like drug development where a lot of the cost is paying for all the experiments that don't work, as a result it's much easier to catch up than to get out "ahead" whatever that means. Setting aside the ethical issues, the functional issue of how to effectively use plausible-sounding crap generators as part of reliable software systems remains unsolved.

Show thread

Andrea (Drea) Tamar Pinski Mar 12

@mirth @ariadne This here explains why the US companies are so upset with China here.

Show thread

Ariadne Conill 🐰

Mar 12

@pinskia @mirth yep they broke the illusion.

IMO the real reason OpenAI reserved all of this RAM and shit is to prevent competitors from buying it

Show thread

Janne Moren Mar 12

@ariadne @pinskia @mirth
What they are doing is forcing competitors to do more with less. Smaller models with a clever architecture, not huge monoliths trained by brute force. Might come back to bite them sooner or later.

I'd like to see more hybrid models, where the LLM largely sticks to being the language module, and other models (possibly not even NN) specialize in other functions.

Show thread

Ariadne Conill 🐰

Mar 12

@jannem @pinskia @mirth yes, this is what i eventually want to build. a set of libre building blocks for building ethical, libre and personal agentic systems that are self-contained.

the shit Big AI is doing is not interesting to me, but SLMs and other specialized neural models legitimately provide a useful set of tools to have in the toolbox.

today, however, I just want to prove the ideas out by shitposting in IRC ;)

Show thread

Ariadne Conill 🐰

Mar 12

@jannem @pinskia @mirth that said, i think that OpenAI and other hardware/resource hoarders need to be called out on the fact that they don't need all of this to ship product

there really is no need to destroy the climate or make professional GPUs cost as much as a recent vintage used car

Show thread

LisPi

@ariadne @jannem @pinskia @mirth > make professional GPUs cost as much as a recent vintage used car

Or a new one. There are 10k$ new electrical cars (very baseline) from China.