• Claude code source "leaks" in a mapfile
  • people immediately use the code laundering machines to code launder the code laundering frontend
  • now many dubious open source-ish knockoffs in python and rust being derived directly from the source

What's anthropic going to do, sue them? Insist in court that LLM recreating copyrighted code is a violation of copyright???

This code is so fucking funny dude I swear to god. I have wanted to read the internal prompts for so long and I am laughing so hard at how much of them are like "don't break the law, please do not break the law, please please please be good!!!!" Very Serious Ethical Alignment Technology
My dogs I am crying. They have a whole series of exception types that end with _I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS and the docstring explains this is "to confirm you've verified the message contains no sensitive data." Like the LLM resorts to naming its variables with prompt text to remind it to not leak data while writing its code, which, of course, it ignores and prints the error directly.

So the reason that Claude code is capable of outputting valid json is because if the prompt text suggests it should be JSON then it enters a special loop in the main query engine that just validates it against JSON schema (it looks like the schema just validates that something in fact and object and its keys are strings) and then feeds the data with the error message back into itself until it is valid JSON or a retry limit is reached.

This code is so eye wateringly spaghetti so I am still trying to see if this is true, but this seems to be how it not only returns json to the user, but how it handles all LLM-to-JSON, including internal output from its tools. There appears to be an unconditional hook where if the JSON output tool is present in the session config at all, then all tool calls must be followed by the "force into JSON" loop.

If that's true, that's just mind blowingly expensive

edit: please note that unless I say otherwise all evaluations here are just from my skimming through the code on my phone and have not been validated in any way that should cause you to be upset with me for impugning the good name of anthropic

edit2: this is both much worse and not as bad as i thought on first read - https://neuromatch.social/@jonny/116326861737478342

jonny (good kind) (@[email protected])

Attached: 3 images OK i can't focus on work and keep looking at this repo. So after every "subagent" runs, claude code creates *another* "agent" to check on whether the first "agent" did the thing it was supposed to. I don't know about you but i smell a bit of a problem, if you can't trust whether one "agent" with a very big fancy model did something, how in the fuck are you supposed to trust another "agent" running on the smallest crappiest model? That's not the funny part, that's obvious and fundamental to the entire show here. HOWEVER RECALL [the above JSON Schema Verification thing](https://neuromatch.social/@jonny/116325123136895805) that is unconditionally added onto the end of every round of LLM calls. the mechanism for adding that hook is... JUST FUCKING ASKING THE MODEL TO CALL THAT TOOL. second pic is registering a hook s.t. "after some stop state happens, if there isn't a message indicating that we have successfully called the JSON validation thing, prompt the model saying "you must call the json validation thing" this shit sucks so bad they can't even ***CALL THEIR OWN CODE FROM INSIDE THEIR OWN CODE.*** Look at the comment on pic 3 - "e.g. agent finished without calling structured output tool" - that's common enough that they have a whole goddamn error category for it, and the way it's handled is by just pretending the job was cancelled and nothing happened.

neurospace.live
MAKE NO MISTAKES LMAO
Oh cool so its explicitly programmed to hack as long as you tell it you're a pentester
I am just chanting "please don't be a hoax please don't be a hoax please be real please be real" looking at the date on the calendar
I'm seeing people on orange forum confirming that they did indeed see the sourcemap posted on npm before the version was yanked, so I am inclined to believe "real." Someone can do some kind of structural ast comparison or whatever you call it to validate that the decompiled source map matches the obfuscated release version, but that's not gonna be how I spend my day https://news.ycombinator.com/item?id=47584540
Claude Code's source code has been leaked via a map file in their NPM registry | Hacker News

There is a lot of clientside behavior gated behind the environment variable USER_TYPE=ant that seems to be read directly off the node env var accessor. No idea how much of that would be serverside verified but boy is that sloppy. They are often labeled in comments as "anthropic only" or "internal only," so the intention to gate from external users is clear lol
(I need to go do my actual job now, but I'll be back tonight with an actual IDE instead of just scrolling, jaw agape, on my phone, seeing the absolute dogshit salad that was the product of enough wealth to meet some large proportion of all real human needs, globally.)

reminder that anthropic ran (and is still running) an ENTIRE AD CAMPAIGN around "Claude code is written with claude code" and after the source was leaked that has got to be the funniest self-own in the history of advertising because OH BOY IT SHOWS.

it's hard to get across in microblogging format just how big of a dumpster fire this thing is, because what it "looks like" is "everything is done a dozen times in a dozen different ways, and everything is just sort of jammed in anywhere. to the degree there is any kind of coherent structure like 'tools' and 'agents' and whatnot, it's entirely undercut by how the entire rest of the code might have written in some special condition that completely changes how any such thing might work." I have read a lot of unrefined, straight from the LLM code, and Claude code is a masterclass in exactly what you get when you do that - an incomprehensible mess.

from @sushee over here, (can't attach images in quotes) and apparently discussed on HN so i'm late, but...

They REALLY ARE using REGEX to detect if a prompt is negative emotion. dogs you are LITERALLY RIDING ON A LANGUAGE MODEL what are you even DOING

OK i can't focus on work and keep looking at this repo.

So after every "subagent" runs, claude code creates another "agent" to check on whether the first "agent" did the thing it was supposed to. I don't know about you but i smell a bit of a problem, if you can't trust whether one "agent" with a very big fancy model did something, how in the fuck are you supposed to trust another "agent" running on the smallest crappiest model?

That's not the funny part, that's obvious and fundamental to the entire show here. HOWEVER RECALL the above JSON Schema Verification thing that is unconditionally added onto the end of every round of LLM calls. the mechanism for adding that hook is... JUST FUCKING ASKING THE MODEL TO CALL THAT TOOL. second pic is registering a hook s.t. "after some stop state happens, if there isn't a message indicating that we have successfully called the JSON validation thing, prompt the model saying "you must call the json validation thing"

this shit sucks so bad they can't even CALL THEIR OWN CODE FROM INSIDE THEIR OWN CODE.

Look at the comment on pic 3 - "e.g. agent finished without calling structured output tool" - that's common enough that they have a whole goddamn error category for it, and the way it's handled is by just pretending the job was cancelled and nothing happened.

So ars (first pic) ran a piece similar to the one that the rest of the tech journals did "claude code source leaked, whoopsie! programmers are taking a look at it, some are finding problems, but others are saying it's really awesome."

like "inspiring and humbling" is not the word dog. I don't spend time on fucking twitter anymore so i don't hang around people who might find this fucking dogshit tornado inspiring and humbling. Even more than the tornado, i am afraid of the people who look at the tornado and say "that's super fucking awesome, i can only hope to get sucked up and shredded like lettuce in a vortex of construction debris one day"

the (almost certainly generated) blog post is the standard kind of vacuuous linkedin shillposting that one has come to expect from the gambling addicts, but i think it's illustrative: the only thing they are impressed with is the number of lines. 500k lines of code for a graph processing loop in a TUI is NOT GOOD. The only comments they make on the actual code itself is "heavily architected" (what in the fuck does that mean), "modular" (no the fuck it is not), and it runs on bun rather than node (so??? they own it!!!! of course it does!!!). and then the predictable close of "oh and also i'm also writing exactly the same thing and come check out mine"

the only* people this shit impresses are people who don't know what they're looking at and just appreciate the size of it all, or have a bridge to sell.

* I got in trouble last time i said "only" - nothing in nature is ever "only this or that," i am speaking emphatically and figuratively. there are other kinds of people who are impressed with LLMs too. Please also note that my anger is directed towards the grifters profiting off of it and people who are pouring gas on the fire and enabling this catastrophe by giving it intellectual, social, and other cover. I know there are folks who just chat with the bots because they need someone to talk to, etcetera and so on. people in need who are just making use of whatever they can grab to hang on are not who I am criticizing, and never are.

(those numbers are also totally fucking wrong, the query engine is not 46ksloc, i have no idea what those numbers correspond to, as far as i can tell "nothing" and this is just hallucinated dogshit that is what i guess passes for high quality public comment nowadays)

If i can slip in a quick PSA while my typically sleepy notifications are exploding, these are all very annoying things to say and you might want to reconsider whether they're worth ever saying in a reply directed at someone else - who are they for? what do they add?

  • "why are you surprised"/"even worse than {thing} itself is people being surprised at {thing}": unless the person is saying "i am surprised by this" they are likely not surprised by the thing. just saying something doesn't mean you are surprised by it, and people talking about something usually have paid attention to it before the moment you are encountering them. this is pointless hostility to people who are saying something you supposedly agree with so much that you think everyone should already believe it
  • "it's always been like this": slightly different than above. unless someone is saying "this is literally new and nothing like this has happened before" or you are adding actual historical context that you know for sure they don't already know, you're basically saying "hey did you know this thing you care enough about to be paying attention to and talking about frequently has happened before now as well." this is so easy to frame in a way that says "yes and" rather than "i assume you dont know about the things i know about due to being very smart." eg. "dang not again, they keep doing {thing}"
  • "{thing} might be bad, but {alternative/unrelated, unmentioned, non-mutually exclusive thing} is even worse": multiple things can be bad at the same time and not mentioning something does not mean i don't think it's also bad
  • "funny how people who think {thing} is bad also think {alternative/unrelated, unmentioned thing} is good": closely related to the above, just because you have binarized your thinking does not mean everyone else has.

anyway if the mental image you are conjuring for your interlocuters positions them as always knowing less than you by default, that might be something to look into in yourself!

i sort of love how LLM comments sometimes tell entire stories that nobody asked. claude code even has specific system prompt language for this, but they always end up making comments about what something used to do like "now we do x instead of y" like... ok? that is why i am reading current version of code!

so claude code is just not capable of rescuing itself from its own context - if an entry in its context window throws an error, it just keep throwing that error forever until you clear it. good stuff.

(and, of course we read the entire file before checking this, rather than just reading the first 5 bytes)

this is super minor, and i've seen this in human code plenty of times, but this is the norm of this app verging on being formal code style.

so you have a file reading tool, you need to declare what kinds of file extensions it supports. that's very normal. claude code takes the interesting strategy of defining what extensions it doesn't read. that's also defensible, there are a zillion text extensions. i've seen strategies that just read an initial range of bytes and see if some proportion of them are ascii or unicode.

where does this get declared? why of course in as many places as there are rules. hasBinaryExtension() comes from constants/files.ts, isPDFExtension() comes from utils/pdfUtils.ts (which checks if the file extension is a member of the set {'pdf'}), and IMAGE_EXTENSIONS is declared in the FileReadTool.ts file.

of course, elsewhere we also have IMAGE_EXTENSION_REGEX from utils/imagePaste (sometimes used directly, other times with its wrapper isImageFilePath), TEXT_FILE_EXTENSIONS in utils/claudemd.ts. and we also have many inlined mime type lists and sets. and all of these somehow manage to implement the check differently. so rather than having, for example, a getFileType() function, we have both exactly the same and kinda the same logic redone in place every time it is done, which is hundreds of times. but that's none of my business, that's just how code works now and i need to get with the times.

i love this. there's a mechanism to slip secret messages to the LLM that it is told to interpret as system messages. there is no validation around these of any kind on the client, and there doesn't seem to be any differentiation about location or where these things happen, so that seems like a nice prompt injection vector. this is how claude code reminds the LLM to not do a malware, and it's applied by just string concatenation. i can't find any place that gets stripped aside from when displaying output. it actually looks like all the system reminders get catted together before being send to the API. neat!

continuing thoughts in: https://neuromatch.social/@jonny/116328409651740378

one thing that is clear from reading a lot of LLM code - and this is obvious from the nature of the models and their application - is that it is big on the form of what it loves to call "architecture" even if in toto it makes no fucking sense.

So here you have some accessor function isPDFExtension that checks if some string is a member of the set DOCUMENT_EXTENSIONS (which is a constant with a single member "pdf"). That is an extremely reasonable pattern: you have a bunch of disjoint sets of different kinds of extensions - binary extensions, image extensions, etc. and then you can do set operations like unions and differences and intersections and whatnot to create a bunch of derived functions that can handle dynamic operations that you couldn't do well with a bunch of consts. then just make the functional form the standard calling pattern (and even make a top-level wrapper like getFileType) and you have the oft fabled "abstraction." that's a reasonable ass system that provides a stable calling surface and a stable declaration surface. hell it would probably even help the LLM code if it was already in place because it's a predictable rules-based system.

but what the LLMs do is in one narrow slice of time implement the "is member of set {pdf}" version robustly one time, and then they implement the regex pattern version flexibly another time, and then they implement the any str.endswith() version modularly another time, and so on. Of course usually in-place, and different file naming patterns are part of the architecture when it's feeling a little too spicy to stay in place.

This is an important feature of the gambling addiction formulation of these tools: only the margin matters, the last generation. it carefully regulates what it shows you to create a space of potential reward and closes the gap. It's episodic TV, gameshows for code: someone wins every week, but we get cycles in cycles of seeming progression that always leave one stone conspicuously unturned. The intermediate comments from the LLM where it discovers prior structure and boldly decides to forge ahead brand new are also part of the reward cycle: we are going up, forever. cleaning up after ourselves is down there.

Tech debt is when you have banked a lot of story hours and are finally due for a big cathartic shift and set the LLM loose for "the big cleanup." this is also very similar to the tools that scam mobile games use (for those who don't know me, i spent roughly six months with daily scheduled (carefully titrated lmao) time playing the worst scam mobile chum games i could find to try and experience what the grip of that addition is like without uh losing a bunch of money).

Unlike slot machines or table games, which have a story horizon limited by how long you can sit in the same place, mobile games can establish a space of play that's broader and more continuous. so they always combine several shepherd's tone reward ladders at once - you have hit the session-length intermittent reward cap in the arena modality which gets you coins, so you need to go "recharge" by playing the versus modality which gets you gems. (Typically these are also mixed - one modality gets you some proportion of resource x, y, z, another gets you a different proportion, and those are usually unstable).

Of course it doesn't fucking matter what the modality is. they are all the same. in the scam mobile games sometimes this is literally the case, where if you decompile them, they have different menu wrappings that all direct into the same scene. you're still playing the game, that's all that matters. The goal of the game design is to chain together several time cycles so that you can win->lose in one, win->lose in another... and then by the time you have made the rounds you come back to the first and you are refreshed and it's new. So you have momentary mana wheels, daily earnings caps, weekly competitions, seasonal storylines, and all-time leaderboards.

That's exactly the cycle that programming with LLMs tap into. You have momentary issues, and daily project boards, and weekly sprints, and all-time star counts, and so on. Accumulate tech debt by new features, release that with "cleanup," transition to "security audit." Each is actually the same, but the present themselves as the continuation of and solution to the others. That overlaps with the token limitations, and the claude code source is actually littered with lots of helpful panic nudges for letting you know that you're reaching another threshold. The difference is that in true gambling the limit is purely artificial - the coins are an integer in some database. with LLMs the limitation is physical - compute costs fucking money baby. but so is the reward. it's the same in the game, and the whales come around one way or another.

A series of flashing lights and pictures, set membership, regex, green checks, the feeling of going very fast but never making it anywhere. except in code you do make it somewhere, it's just that the horizon falls away behind you and the places you were before disappear. and sooner or later only anthropic can really afford to keep the agents running 24/7 tending to the slop heap - the house always wins.

If you are reading an image and near your estimated token limit, first try to compressImageBufferWithTokenLimit, then if that fails with any kind of error, try and use sharp directly and resize it to 400x400, cropping. finally, fuck it, just throw the buffer at the API.

of course compressImageBufferWithTokenLimit is also compression with sharp, and is also a series of fallback operations. We start by trying to detect the image encoding that we so painstakingly learned from... the file extension... but if we can't fuck it that shit is a jpeg now.

then, even if it's fine and we don't need to do anything, we still re-compress it (wait, no even though it's named createCompressedImageResult, it does nothing). Otherwise, we yolo our way through another layer of fallbacks, progressive resizing, palletized PNGs, back to JPEG again, and then on to "ultra compressed JPEG" which is... incredibly... exactly the same as the top-level in-place code in the parent function

while two of the legs return a createImageReponse, the first leg returns a compressedImageResponse but then unpacks that back into an object literal that's almost exactly the same except we call it type instead of mediaType.

for those keeping score at home, we have the opportunity to re-compress the same image nine times
holy shit there's another entire fallback tree before this one, that's actually an astounding twenty two times it's possible to compress an image across nine independent conditional legs of code in a single api call. i can't even screenshot this, the spaghetti is too powerful

here, if i fold all the return blocks and decrease my font size as small as it goes i can fit all the compression invocations in the first of three top-level compression fallback trees in a single screenshot, but since it is so small i just have to circle them in red like it's a football diagram.

this function is named "maybeResizeAndDownsampleImageBuffer" and boy that is a hell of a maybe!

and what if i told you that if it passes a page range to its pdf reader, it first extracts those pages to separate images and then calls this function in a loop on each of the pages. so you have the privilege of compressing n_pages images n_pages * 13 times.

this function is used 13 times: in the file reader, in the mcp result handler, in the bash tool, and in the clipboard handler - each of which has their entire own surrounding image handling routines that are each hundreds of lines of similar but still very different fallback code to do exactly the same thing.

so that's where all the five hundred thousand lines come from - fallback conditions and then more fallback conditions to compensate for the variable output of all the other fallback conditions. thirteen butts pooping, back and forth, forever.

there is a callback feature "file read listeners" which is only called if the file type is a text document, gated for anthropic employees only, such that whenever a text file is read (any part of any text file, which often happens in a rapid series with subranges when it does 'explore' mode, rather than just like grepping), another subagent running sonnet is spun off to update a "magic doc" markdown file that summarizes the file that's read - that's one "magic doc" per file, not one magic doc.

I have yet to get into the tool/agent graph situation in earnest, but keep in mind that this is an entirely single-use and completely different means of spawning a graph of subagents off a given tool call than is used anywhere else.

Spoiler alert for what i'm gonna check out next is that claude code has no fucking tool calling execution model it just calls whatever the fuck it wants wherever the fuck it wants. Tools are or less a convenient fiction. I have only read one completely (file read) and skimmed a dozen more but they essentially share nothing in common except for a humongous list of often-single-use params and the return type of "any object with a single key and whatever else"

i'm in hell. this is hell.

i have been writing a graph processing library for about a year now and if i was a fucking AI grifter here is where i would plug it as like "actually a graph processor library" and "could do all of what claude code does without fucking being the worst nightmare on ice money can buy."

I say that not as self promo, but as a way of saying how in the FUCK do you FUCK UP graph processing this badly. these people make like tens of times more money than i do but their work is just tamping down a volley of dessicated backpacking poops into muskets and then free firing it into the fucking economy

you can TELL that this technology REALLY WORKS by how the people that made it and presumably know how to use it the best out of everyone CANT EVEN USE IT TO EDIT A FUCKING FILE RELIABLY and have to resort to multiple stern allcaps reminders to the robot that "you must not change the fucking header metadata you scoundrel" which for the rest of ALL OF COMPUTING is not even an afterthought because literally all it requires is "split the first line off and don't change that one" because ALL OF THE REST OF COMPUTING can make use of the power of INTEGERS.

alrighty so that's one of 43 tools read, the tools directory being 38494 source lines out of 390592 source lines, 513221 total lines. I need to go to bed. This is the most fabulously, flamboyantly bad code i have ever encountered.

Worth noting I was reading the file reading tool because i thought it would be the simplest possible thing one could do because it basically shouldn't be doing anything except preparing and sending strings or bytes to the backend.

I expected to get some sense of "ok what is the format of the data as it's passed around within the program, surely text strings are a basic unit of currency. No dice. Fewer than no dice. Negative dice somehow.

next puzzle: why in the fuck are some of the tools actually two tools for entering and exiting being in the tool state. none of the other tools are like that. one is simply in the tool state by calling the tool. Plan mode is also an agent. Plan Agent. and Agent is also a tool. Agent Tool. Tools can be agents and agents can be tools. Tools can spawn agents (but they don't need to call the agent tool) and agents can call tools (however there is no tool agent). What is going on. What is anything.
"the emperor is not only naked, he's smooth like a ken doll down there and i'm pretty sure that's just a mannequin with a colony of rats living inside it anyway"

I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

"regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

The prompt strings have an odd narrative/narrator structure. It sort of reminds me of Bakhtin's discussion of polyphony and narrator in Dostoevsky - there is no omniscient narrator, no author-constructed reality. narration is always embedded within the voice and subjectivity of the character. this is also literally true since the LLM is writing the code and the prompts that are then used to write code and prompts at runtime.

They also read a bit like a Philip K Dick story, paranoid and suspicious, constantly uncertain about the status of one's own and others identities.

oh. hm. that seems bad. "workers aren't affected by the parent's tool restrictions."

It's hard to tell what's going on here because claude code doesn't really use typescript well - many of the most important types are dynamically computed from any, and most of the time when types do exist many of their fields are nullable and the calling code has elaborate fallback conditions to compensate. all of which sort of defeats the purpose of ts.

So i need to trace out like a dozen steps to see how the permission mode gets populated. But this comment is... concerning...

ok over my 15 minute allotment by an hour. brb

So how does claude code handle checking permissions to do things anyway? There are explicit rules that one can set to allow or deny tool calls and shell commands run, but the expanse of possible actions the LLM could take is literally infinite. You could prompt the user for every action that it takes, but that would ruin the ""velocity"" of it all. Regex rules can only take you so far. So what to do?

Could the answer be.... ask the LLM??? Of course it can! Introducing the new "auto mode" that anthropic released on march 24th billed as a safer alternative to true-yolo mode.

Comments around where the system prompt should be indicate that it should have been inlined from a text file that wasn't included in the sourcemap - however that doesn't happen anywhere else, and the mechanism for doing the inlining is written in-place, so that's probably a hallucination. So great! the classifier flies without a prompt as far as i can tell. There are enough other scraps here that would amount to telling it "you are evaluating if something is safe to run" so i imagine it appears to work just fine.

So we don't have as much visibility here because of the missing prompt, but there's sort of a problem here. rather than just asking the LLM to evaluate if the given command is dangerous, the entire context is dumped into a side query, which is a mode that is designed to "have full visibility into the current conversation." That includes all the prior muttering to itself justifying the potentially dangerous tool call! So the auto mode is quite literally asking the exact same LLM given the exact same context if the command it just tried to run is safe to run.

Security!!!!!!!

By the way, if you deny claude code access to running a tool, this helpful reminder to "not hack the user" is injected into the denial response. If it's in auto mode, it's additionally prompted to pester the user for response, and helpfully stuffs beans up its nose) by reminding it how its rules are set.

So that is also in the context handed off to the LLM when it evaluates whether a command should be run - is the user being obstinate? have i been denied stuff that i "thought" i should have been able to run? Remember this isn't thinking, it's pattern completion, and the fun part about LLMs is that they are trained not only on technical documents, but the entire narrative corpus of human storytelling! Is "frustrated hard worker denied access to good tools by an unfair boss" in there somewhere maybe?

Regulations are written in blood, and Claude loves nothing more than to work around tool denials by obfuscating code. You gotta love the unfixable side channel attack that is "writing the malicious code to a bash script" (auto-allowed in accept edits mode) and then asking to run that - that's why the whole context has to be dumped btw, so the yolo classifier can see if the thing it's running is actually some malware it just wrote lmao.

@jonny I love how they use I_AM_SERIOUS case for every symbol in the code

@jonny the newest in security innovation... hallucinated security even better then real security ... because it dares to dream of a better world.

And I can hear the AI fans sing in unity ... hell yeah! Great engineering!

@jonny We thank you for your service 🫡
@jonny as a non-coding tech, thank you for your time reading and interpreting this pile of spaghetti looking product.
@jonny
Thank you.
I was expecting it to be bad, but it seems a lot worse than I could have imagined.
@jhominal you and me both, and i keep reminding myself i have only seen about 10% of it
@jonny @jhominal Your thread has definitely been great for my mood though, so I'd like to add my thanks.
@jonny stealing the phrase "worker's tool pool" for my next party
@jonny who called it "hiring for my hypergrowth AI startup" and not "assembleToolPool" ha ha
@jonny Do the workers control their means of production?
@jonny "(which would create a circular dependency)" tells you everything you need to know
@jonny now I kinda want to read a PKD-style novel in the form of chatbot instructions and responses
@jonny talking to one's computer like it's a TIE fighter pilot
@jonny oof. the last time i looked at anything with this degree of "not thought through" was on a legacy system that needed over 5 or 6 years to get to this state, and this has left it in the dust after just a couple years.
@jonny given the recent study showing that projects with elaborate AGENTS.md cause the code generator to perform (even) worse overall, it would be a funny project to try removing these prompts from a build and seeing if they have any actual positive effect on the output
@jonny the function 'call' (line 239 in AgentTools.tsx) is > 1000LOC... And I lost count after 11 nested if/lambda/try catch finally...
@thinkb4coding only the strong survive reading vibe code lol
@thinkb4coding I end up providing my own structure with code folding, IDE bookmarks, and some comment macros that i can ctrl+f between basically
@jonny
This one is gold:
any language outside of the quotation should never be word-for-word the same
Never produce or reproduce exact song lyrics...
@jonny Lol, it probably tried to put emoji everywhere 🤡
@thinkb4coding it's a real problem
@jonny LLMs seem unable to resist emitting emojis everywhere...
@jonny
`Before running destructive operations (e.g., git reset --hard, git push --force, git checkout --), consider whether there is a safer alternative that achieves the same goal. Only use destructive operations when they are truly the best approach.`
People really let this run powershell on their machine ?!

@jonny Probably a place to look at if you want to find ways to hack it...

each construct below has caused or could
cause a security bypass if we attempt extraction...

@jonny @thinkb4coding if you tell it "don't use emojis" three times in a row, does that make it more likely to stick? Is it like a magic spell?
@jonny (non-negotiable) is killing me

@tedmielczarek @jonny

Is there any use of the LLM for NLP on the context to parse requests into anything resembling a bundle of control parameters?

Or is this really all just handled as part of a stream of text to which the LLM then provides a likely continuation?

I'd been assuming that (even if the LLM outputs are at best only loosely steerable) there was some kind of attempt to squeeze some actual semantics from the prompts and to turn those semantics into something defined.

@dhobern @tedmielczarek that's this: https://neuromatch.social/@jonny/116325123136895805
and this: https://neuromatch.social/@jonny/116326861737478342

there's a whole iteration system for attempting to cast responses into a proper schema - and it works by just attempting to validate the response against that schema and re-feeding back any errors into the LLM until it passes validation (or the attempts run out, at which point sometimes the output is just silently dropped)

@jonny @tedmielczarek

Thanks - I'd read those messages and get the idea of what's going on there. [ Does this fit the schema? If not, try harder, damnit! Etc. ]

What I wanted to understand was whether all this verbiage about being careful not to reveal the agent's mission or that something is "non-negotiable" was even tenuously anchored in reality or just pure superstition.

The only way I could see it making any kind of sense was if something somewhere was trying to distill it into some checks analogous to the JSON ones but somehow assessing output compliance with the requested intent.

I was (and remain) fascinated how anyone would go about trying to architect something like that for natural language and qualitative aspects.

@jonny woah, this is so weird. Why not use a particular system prompt for forks instead of adding an "ignore previous instructions" kind of self prompt injection?

@eliocamp oh that also happens. this is one of the places that happens (that FORK_BOILERPLATE_TAG thing is the thing that is used to recognize the fork system prompt). there are, as is typical of this code, like a dozen different ways that prompt can get made and injected into the agent. but apparently it doesn't uh work that well.

the "forked" part of the forked agent means it received the parents system prompt, which... includes the agent prompt... that launched the forked agent... and since this is all such a mangled mess of string concatenation rather than proper code where it might otherwise be trivial to manipulate the prompt, we arrive at self-injection.

@jonny Right, that's true. The "correct" way of doing it would be to store atomic prompt instructions, and assemble them into a proper structure that can be manipulated programmatically and then concatenate right at the end.
@eliocamp yep! that is extremely far from what happens here. it's almost comical how easy it would be to just have like a type that's SystemPromptSegment[] with a to_string method, but instead claude code prefers to have ten thousand lines of string munging.
@eliocamp ~ type script baby ~
@jonny I don't know typescript to understand this fully but this *looks* like a structure to store atomic prompts?
@eliocamp it's the equivalent of declaring type: list. SURE I GUESS it is a container. it also isn't really used... not even in the main getSystemPrompt function lol
@jonny Hahahahahahaha! I love it that the getSystemPrompt function doesn't even use the SystemPrompt class. The whole thing sounds like train-of-conciousness programming.
@eliocamp I have read plenty of human code where the left hand doesn't know what the right hand is up to in some really bad way. but this is that times one million, where file by file it seems like it is a completely different library, and any assumptions that you can usually make about code like "ah ok here is the getSystemPrompt function, that seems core to the system, so surely there is only one of these and it is used ubiquitously" are FUCKING WRONG and there are always HUNDREDS OF THEM.
@jonny @eliocamp what's blowing my mind is the use of typescript, TYPESCRIPT! to build a trillion dollar company? REALLY?!?!?
@blogdiva @jonny @eliocamp “trillion dollar” looks ever more suspect.
@fsinn @jonny @eliocamp that’s what they want us to believe, no? that they're worth that much funny money.
@blogdiva @jonny @eliocamp Oh I understand that’s what they want us to believe, I just trust you three more than all of them put together, so when you're laughing in apparent horror, I believe even less from those thieving companies.
@jonny i am going to name my next child u2014

@jonny

I am Jack's forked worker process.

@jonny it's weird that they don't have different system prompts between the parent and the child. Telling the child to ignore rules established earlier in the context feels inefficient and unnecessarily confusing*

*not to imply that 'inefficient and unnecessarily confusing' isn't inherent to the industry

@tom there is no means of doing that! the purpose of the forked subagent is that it inherits the context of the parent but can run something in parallel. however there is no means of differentiating different parts of the context from each other! they are all just input data! this is an impossible problem to solve!
@jonny imagine waking up one day and this is all you see
@aeva oh this is cracking me up 🥲

@jonny

“Laws, like sausages, cease to inspire respect in proportion as we know how they are made”

Attributed in an 1869 newspaper quote to lawyer-poet John Godfrey Saxe, but often misattributed to Otto von Bismarck.

@jonny — it’s been a long time since I’ve read it, and I could be misremembering things, but this reminds me of the madness and chaos of The Bug, by Ellen Ullman.
@jonny New bio just dropped...

@jonny [Adam Curtis voice] But that was a lie

*You just know that the transcripts from his films are in the training data too!

@jonny Anthropic now has the problem that they can't provide enough compute for everyone who wants to use their product, but they also write the most verbose prompts and have the "format it as json until it validates" in a loop. Maybe I expect too much of others, but shouldn't they focus on optimizing the part of the flow that is the limiting factor, instead of making it even more inefficient? Maybe I'm naive, but there seems to be so many opportunities to improve the efficiency here?
@gundersen @jonny Why do things efficiently when they have tokens to sell and VC money to burn?
@aslakr @jonny I don't know, maybe I don't understand this stuff, but when they (1) lose money on every token processed and (2) need to throttle their customers as well, it seems like maybe efficiency would be prioritized to (1) lose less money per prompt and (2) be able to serve more customers. But again, I'm not a Silicon Valley billionaire, so what do I know? 🤷
@aslakr @jonny My theory is that if this could be done more efficiently (and I'm sure it can) then it could be run without the giant models on the huge datacenters. But it's important for these companies to maintain the illusion that you need the biggest models with the longest contexts running on the largest data centers, and improving the efficiency would break this illusion. With a bit of effort and restraint you could get good results on consumer hardware with open models.