Reading analysis of the Claude Code leak (not reading the code itself, of course) is evidence towards what I had kind of suspected, that the whole thing is a giant magic trick not only in the straightforward LLMentalist way, but also in the sleight of hand way off making you think that this pile of regexes and JSON schema validation loops is *actually* the LLM doing LLM things.
Like, you don't need LLMs, the things that work, work well, and that have worked well for decades are all there, being called by the chatbots... you can just actually use those without 500k lines of spaghetti code and markdown files tricking you into thinking that the JSON parser is alive and has feelings.

@xgranade the worst part?

It occurred to me that we can already easily tokenize code, and know if a string of tokens is valid.

So they could just have "start json" and "end json" tokens and not pick invalid tokens in the middle

@astraluma It continues to be incredibly strange to me that llmbros keep limiting their approach to in-band signaling.

@xgranade it's fucking expensive

But is it any more expensive than what they're already doing

@xgranade @astraluma clearly they have learned nothing from people with blueboxes.....
@freya @astraluma Yes, though I might also submit that "clearly they have learned nothing" is true even more generally.
@xgranade @astraluma you're not wrong. and I say this as a girlie who uses LLMs on the regular for accessibility stuff, even I, a girl about as far from an outright no AI girlie as you can find, think these fucking techbros are incredibly, stunningly fucking useless
@xgranade @freya @astraluma how can they learn anything when all they ever read are summaries?
@xgranade @astraluma I'm not even sure how you'd do out-of-band signalling in an LLM, the model fundamentally sees it all as just a long blob

@orman as far as I know there are special tokens to mark whether the content following is a system message, a chatbot message or a user's message.

These tokens are special in the way that you can't inject them through a user message. You need direct access to the token stream to insert them (not the text that is to be tokenized).

But yes in the end it's still just a large sequence of tokens - but so is e.g. escaping some text for presentation in an HTML document.

@astraluma @xgranade p sure this is what openai's done for years
@astraluma @xgranade actually p sure this is what anthropic does too i havent looked at the leak either but i dont trust the analysis referenced in the op https://platform.claude.com/docs/en/build-with-claude/structured-outputs
Structured outputs

Get validated JSON results from agent workflows

Claude API Docs
jonny (good kind) (@[email protected])

So the reason that Claude code is capable of outputting valid json is because if the prompt text suggests it should be JSON then it enters a special loop in the main query engine that just validates it against JSON schema (it looks like the schema just validates that something in fact and object and its keys are strings) and then feeds the data with the error message back into itself until it is valid JSON or a retry limit is reached. This code is so eye wateringly spaghetti so I am still trying to see if this is true, but this seems to be how it not only returns json to the user, but how it handles *all* LLM-to-JSON, including internal output from its tools. There appears to be an unconditional hook where if the JSON output tool is present in the session config at all, then all tool calls must be followed by the "force into JSON" loop. If that's true, that's just *mind blowingly expensive* edit: please note that unless I say otherwise all evaluations here are just from my skimming through the code on my phone and have not been validated in any way that should cause you to be upset with me for impugning the good name of anthropic edit2: this is both much worse and not as bad as i thought on first read - https://neuromatch.social/@jonny/116326861737478342

neurospace.live
@astraluma @xgranade oh. they're doing exactly what you suggested, the referenced retry logic is for if it never emits the start json token
@astraluma @xgranade the post is just describing that in the most confusing way possible as far as i can tell (and their code is also possibly written confusingly but whatever that's their problem)
@xgranade I have wondered how many posts about how LLMs enabled them to do something seemingly esoteric they wouldnโ€™t have written otherwise could have easily been posts about how the seemingly esoteric thing is not that scary or esoteric.
@xgranade given enough time, LLM dev tooling just evolves into plain old deterministic language servers that have just been given the funding they needed to become useful tools.
@zkat "Don't sent me an LLM written e-mail, send me the prompt you used that has all the information I needed anyway" as applied to code means "don't send me slopware, send me the formal type systems, schemas, and parsers you used to validate the output of your slopware."
@xgranade @zkat "But I didn't validate a thing" ๐Ÿ—ฟ ๐Ÿ’€
@xgranade Sounds like you're saying all that spaghetti bookending and validation could be reduced to some kind of big messy heuristic expert system without LLMs. Chatbots have been done this way, but maybe there's also a way to chatscript a procedural code generator. It almost seems possible, and yet we all know the size of the code library is the thing the AI is really selling, not the guardrails. It's very tempting to use laundered code for some of the more nuanced bits of a project.

@xgranade

And they do it by flooding dollars into data center shaped incinerators.

Itโ€™s magic

@xgranade code pretending to be LLM pretending to be code
@xgranade can you link the analysis you're using? I definitely don't want to read the actual code, I'd better play a game or something than deciphering spaghetti code :p