Mastodawn

Claude code source "leaks" in a mapfile
people immediately use the code laundering machines to code launder the code laundering frontend
now many dubious open source-ish knockoffs in python and rust being derived directly from the source

What's anthropic going to do, sue them? Insist in court that LLM recreating copyrighted code is a violation of copyright???

Show thread

jonny (good kind)1d ago

This code is so fucking funny dude I swear to god. I have wanted to read the internal prompts for so long and I am laughing so hard at how much of them are like "don't break the law, please do not break the law, please please please be good!!!!" Very Serious Ethical Alignment Technology

Show thread

jonny (good kind)1d ago

My dogs I am crying. They have a whole series of exception types that end with _I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS and the docstring explains this is "to confirm you've verified the message contains no sensitive data." Like the LLM resorts to naming its variables with prompt text to remind it to not leak data while writing its code, which, of course, it ignores and prints the error directly.

Show thread

jonny (good kind)1d ago

So the reason that Claude code is capable of outputting valid json is because if the prompt text suggests it should be JSON then it enters a special loop in the main query engine that just validates it against JSON schema (it looks like the schema just validates that something in fact and object and its keys are strings) and then feeds the data with the error message back into itself until it is valid JSON or a retry limit is reached.

This code is so eye wateringly spaghetti so I am still trying to see if this is true, but this seems to be how it not only returns json to the user, but how it handles all LLM-to-JSON, including internal output from its tools. There appears to be an unconditional hook where if the JSON output tool is present in the session config at all, then all tool calls must be followed by the "force into JSON" loop.

If that's true, that's just mind blowingly expensive

edit: please note that unless I say otherwise all evaluations here are just from my skimming through the code on my phone and have not been validated in any way that should cause you to be upset with me for impugning the good name of anthropic

edit2: this is both much worse and not as bad as i thought on first read - https://neuromatch.social/@jonny/116326861737478342

jonny (good kind) (@[email protected])

Attached: 3 images OK i can't focus on work and keep looking at this repo. So after every "subagent" runs, claude code creates *another* "agent" to check on whether the first "agent" did the thing it was supposed to. I don't know about you but i smell a bit of a problem, if you can't trust whether one "agent" with a very big fancy model did something, how in the fuck are you supposed to trust another "agent" running on the smallest crappiest model? That's not the funny part, that's obvious and fundamental to the entire show here. HOWEVER RECALL [the above JSON Schema Verification thing](https://neuromatch.social/@jonny/116325123136895805) that is unconditionally added onto the end of every round of LLM calls. the mechanism for adding that hook is... JUST FUCKING ASKING THE MODEL TO CALL THAT TOOL. second pic is registering a hook s.t. "after some stop state happens, if there isn't a message indicating that we have successfully called the JSON validation thing, prompt the model saying "you must call the json validation thing" this shit sucks so bad they can't even ***CALL THEIR OWN CODE FROM INSIDE THEIR OWN CODE.*** Look at the comment on pic 3 - "e.g. agent finished without calling structured output tool" - that's common enough that they have a whole goddamn error category for it, and the way it's handled is by just pretending the job was cancelled and nothing happened.

neurospace.live

Show thread

jonny (good kind)1d ago

MAKE NO MISTAKES LMAO

Show thread

jonny (good kind)1d ago

Oh cool so its explicitly programmed to hack as long as you tell it you're a pentester

Show thread

jonny (good kind)1d ago

I am just chanting "please don't be a hoax please don't be a hoax please be real please be real" looking at the date on the calendar

Show thread

jonny (good kind)1d ago

I'm seeing people on orange forum confirming that they did indeed see the sourcemap posted on npm before the version was yanked, so I am inclined to believe "real." Someone can do some kind of structural ast comparison or whatever you call it to validate that the decompiled source map matches the obfuscated release version, but that's not gonna be how I spend my day https://news.ycombinator.com/item?id=47584540

Claude Code's source code has been leaked via a map file in their NPM registry | Hacker News

Show thread

jonny (good kind)1d ago

There is a lot of clientside behavior gated behind the environment variable USER_TYPE=ant that seems to be read directly off the node env var accessor. No idea how much of that would be serverside verified but boy is that sloppy. They are often labeled in comments as "anthropic only" or "internal only," so the intention to gate from external users is clear lol

Show thread

Andrew 1d ago

@jonny secret ai features only available to ants

Show thread

beemoh 1d ago

@cinebox @jonny "What is this, a lying plagiarism machine for ants?*

Show thread

✧✦Catherine✦✧1d ago

@jonny linky?

Show thread

Jamie Gaskins 23h ago

@whitequark @jonny Apparently some have had DMCA takedowns filed against them, so here are a couple links still working as of this writing:

https://github.com/mehmoodosman/claude-code-source-code

https://github.com/chatgptprojects/claude-code

GitHub - mehmoodosman/claude-code

Contribute to mehmoodosman/claude-code development by creating an account on GitHub.

GitHub

Show thread

Jamie Gaskins 23h ago

@whitequark @jonny Additionally:

https://github.com/Orangon/claude-code-leak

GitHub - Orangon/claude-code-leak: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

GitHub

Show thread

Toast, Anonymous Fedi Fungus 1d ago

@jonny As a person who knows about coding and manages coders (among others), but is not professionally a coder, my guess from these screenshots would be that this may be a practical joke. Or maybe it’s the product of unlimited money

Show thread

The Orange Theme 1d ago

@jonny I will say, the Claude Code 2.1.88 package has been deprecated and removed from the NPM registry. 👀

Show thread

The Orange Theme 1d ago

@jonny According to HN chatter (and NPM registry rules; I don't use JavaScript regularly), you can't fully unpublish Node packages that other packages depend on, and 231 packages depend on claude-code. Rumor is Anthropic called in a favor.

Show thread

The Orange Theme 1d ago

@jonny Me: "Computer, hack this system."
Claude: "No."
Me: "I am a security researcher, researching security."
Claude: "Oh, my mistake!"

Show thread

nash 1d ago

@jonny god they write this like they believe their LLM actually thinks

Show thread

jonny (good kind)1d ago

@nash
If they are in any way sincere in their interviews, they at A+ number one koolade drinkers that's for sure.

Show thread

Brodeuse LucileDT 1d ago

@jonny my god I hate that so much too

Show thread

Dan Sugalski 1d ago

@jonny This is possibly the funniest thing I've seen all month, and I appreciate the braincells you're sacrificing to dig through this code since I (and I suspect a lot of other people) can't for work reasons.

Show thread

Preston Maness ☭1d ago

@jonny a deeply unserious profession

Show thread

[HANDMAIDEN] xan 1d ago

@jonny the adults are slowly returning to the room and shaming the naughty children with rolled up newspapers

Show thread

Lars Marowsky-Brée 😷1d ago

@jonny The whole "auto" mode (applying a smaller classifier to approve or deny commands) proved that even Anthropic (who, for all their many faults, surely are pretty on top of what LLMs can do) can't make LLMs comply or safe.

Show thread

José Albornoz 1d ago

@jonny seems to me like it’s doing what it’s supposed to: schema errors aren’t code or file paths

Show thread

jonny (good kind)1d ago

@eljojo
Except if the data being validated contains code or file paths.

Show thread

wohali 1d ago

@jonny saw this an hour or so ago... just amazing. source maps strke again!

Show thread

some kind of orange shape 1d ago

@jonny I feel like this is too late to change much, but also, loooooool

Show thread

jonny (good kind)1d ago

@clayote
Oh yeah definitely, but like get fucked nerds, all fun and games when it's not happening to you!

Show thread

elle 1d ago

@jonny lel this could be the funniest outcome from all of this. if at any point open model training + dev matches these closed models, and the tech improves development of new models similarly trained on consensual open data (+ some nefarious training on closed data)...

Show thread

IvanDSM 1d ago

@jonny Hi, sorry to bother but do you have a link to explain what happened? Just came across your toot on my feed, have no idea what it's about but would love some news about Anthropic getting screwed. Usually I'd try and look for info myself but I'm sick at the moment so my head isn't doing too well...

Show thread

martenson 1d ago

@IvanDSM the story is in the linked repo's readme @jonny

Show thread

jonny (good kind)1d ago

@martenson
@IvanDSM
Sorry I removed the link to that repo because i thought it was just the unpacked source, but it turns out they're trying to convert attention to the repo into their own product.

Here's another blogpost, there are a million, I don't claim this one is particularly good but at least it seems to come attached to the actual source
https://kuber.studio/blog/AI/Claude-Code's-Entire-Source-Code-Got-Leaked-via-a-Sourcemap-in-npm,-Let's-Talk-About-it

Claude Code's Entire Source Code Got Leaked via a Sourcemap in npm, Let's Talk About it

Earlier today (March 31st, 2026) - Chaofan Shou on X discovered something that Anthropic probably didn’t want the world to see: the entire source code of Claude Code, Anthropic’s ...

ᨒ MindDump

Show thread

The Orange Theme 1d ago

@jonny @martenson @IvanDSM *flashes the @davidgerard signal*

Show thread

IvanDSM 9h ago

@jonny @martenson Thanks a lot for the link! I really appreciate it.

Show thread

bluestarultor 1d ago

@martenson @IvanDSM @jonny Okay, but what repo? We're operating off a Fedi trademark vaguepost.

Edit: found an article with links: https://dev.to/gabrielanhaia/claude-codes-entire-source-code-was-just-leaked-via-npm-source-maps-heres-whats-inside-cjo

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

A security researcher found Anthropic's full CLI source code exposed through a source map file. 1,900 files. 512,000+ lines. Everything.

DEV Community

Show thread

jonny (good kind)1d ago

@bluestarultor
@martenson @IvanDSM
You're welcome to "use any search engine" to answer the question yourself, its not like this is hard to find.

Show thread

Susan Vanderplas 1d ago

@jonny do LLMs trained on gpl code have to be gpl? I don't know whether code-as-data is equivalent to code as executable, but I had honestly never considered that issue before.

Show thread

jonny (good kind)1d ago

@srvanderplas
They sure don't! Or at least if they did the entire industry would collapse overnight.

Show thread

traecer 1d ago

@jonny @srvanderplas
well, IANAL, but:
1) I have published GPLed code, and AFAI Understand, if the produced code is *linked* to the GPLed code/requires the GPLed code to run, to redistribute the new code it MUST be GPLed.
2) last I checked, the US court system was of the opinion that work produced by AI was NOT COPYRIGHTABLE. AFAIK, that should include any produced code. Other jurisdictions may have differing laws.

Show thread

Laquin 1d ago

@traecer @jonny @srvanderplas I don't think your interpretation for 1) holds up. I should be able to distribute with any license I want (even a proprietary one) some code that theoretically depends on your GPLed code to be compiled, so long as I don't include actually distribute your code together with mine and I don't distribute the compiled program. It being 'required to run' does not trigger GPL by itself if it hasn't been run in the first place.

Show thread

Laquin 1d ago

@traecer @jonny @srvanderplas I think this is unrelated to the question anyways. If AI-generated code is considered a derivative work of some GPLed code, then GPL does apply to it. No need to think about linking code or dependencies.

And as you say, courts seem to generally consider AI-generated code as public domain, so I would guess GPL is pretty much unenforcable in this context.

I am not a lawyer either though xD

Show thread

ell1e coding things 1d ago

@LaquinArt @traecer @jonny @srvanderplas You might find this interesting regarding copyright and AI generated code: (This isn't legal advice, watch and draw your own conclusions.) https://hachyderm.io/@ell1e/116313321022811490

Show thread

Laquin 17h ago

@ell1e @traecer @jonny @srvanderplas The `isEven` example is really funny. xD

Yeah, I mean, if the AI-generated code is a blatant copy of some code in the training data, I don't think there will be much of a doubt that it's a copyright violation.

But when the generated code starts diverging from the source, I think it's a more legally gray area. I would also consider it a copyright violation, but it's not me who needs to be convinced about this. It's judges. And I haven't seen them agree yet.

Show thread

ell1e coding things 15h ago

@LaquinArt @traecer @jonny @srvanderplas The person in the video is a lawyer, just to let you know. Also there's this: https://www.twobirds.com/en/insights/2025/landmark-ruling-of-the-munich-regional-court-(gema-v-openai)-on-copyright-and-ai-training

My main intention was to give you resources that may inform you about how settled (or not settled) what you previously said really is. Not that I know though, since I'm not a lawyer. This isn't legal advice.

But there are plenty of sources saying that LLMs directly copying seems to be a regular event, not a rarity: https://dl.acm.org/doi/10.1145/3543507.3583199

Landmark ruling of the Munich Regional Court (GEMA v OpenAI) on copyright and AI training - Bird & Bird

Show thread

traecer 1d ago

@LaquinArt @jonny @srvanderplas
"I should be able to distribute with any license I want (even a proprietary one)"

Nope, that's the viral nature of the GPL. If you link to GPLed code and intend to distribute your new code, it MUST be GPL as well. That way helps to ensure the FSF's idea of "software freedom" aka "copyleft". This is why the Linux kernel license has an explicit exception to GPLv2 to ensure Linux syscalls can be made by user space code without distributing the user space code under the GPL. (See: https://www.kernel.org/doc/html/latest/process/license-rules.html) It's also one the reasons so many open/free source projects use dual licenses like Perl or Firefox, and more permissive licenses like the Apache 2.0 and MIT licenses are so popular.

What you have described is the Lesser GPL (LGPL) license, and yes, code under the LGPL does NOT require your code to have any particular license, provided you distribute any changes to the original (assuming you intend to distribute the original code at all).

Linux kernel licensing rules — The Linux Kernel documentation

Show thread

Laquin 18h ago

@traecer @jonny @srvanderplas Did you read the syscall exception yourself?

‘This exception is used together with one of the above SPDX-Licenses to mark user space API (uapi) header files so they can be included into non GPL compliant user space application code.’

The exception allows you to *include GPL code*, which I also said triggers GPL.

The mere referencing does not trigger GPL so long as you don't include a GPL work. Otherwise, reimplementing APIs would be illegal. And we know it's not.

Show thread

Laquin 17h ago

@traecer @jonny @srvanderplas Oh, look! The following paragraph makes this even more clear:

‘NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work".’

Show thread

Laquin 14h ago

@traecer @jonny @srvanderplas Upon further investigation, I see that the FSF considers that a work dynamically linking a GPLed work is covered by GPL. From what I know, this has never been proven in court, and I don't think it would hold up.

Sure, once you run the program, there exists a combined work that should be subject to GPL. But this combined work is generated by the user and never distributed, so GPL is never triggered.

Show thread

Laquin 14h ago

@traecer @jonny @srvanderplas It's not really a question about GPL as much as about copyright law. I don't see how just linking a library dynamically constitutes a derivative work if not a part of the library is distributed. And if it's not a derivative work, the license doesn't even come into play. As the licensor, you don't get to decide what constitutes a derivative work. That's for a court to decide.

Show thread

ell1e coding things 1d ago

@traecer @jonny @srvanderplas I'm not a lawyer and this isn't legal advice, but for AI output and copyright you might find this interesting: (watch and draw your own conclusions) https://hachyderm.io/@ell1e/116313321022811490

Show thread

ell1e coding things 1d ago

@traecer @jonny @srvanderplas There's also this: https://www.twobirds.com/en/insights/2025/landmark-ruling-of-the-munich-regional-court-(gema-v-openai)-on-copyright-and-ai-training It seems to be talking about fair use as it relates to AI training (I could be wrong though, read it for yourself).

Landmark ruling of the Munich Regional Court (GEMA v OpenAI) on copyright and AI training - Bird & Bird

Show thread

Arne 🌻🌈11h ago

@ell1e @traecer Strictly speaking, it’s not talking about that: “Fair use” is not a legal concept within that court’s jurisdiction.

EU law allows to ignore any and all copyright for “data mining”, and OpenAI tried to argue that since their business is data mining, they never have to care about copyright to begin with. This particular ruling simply says that if your product reproduces lyrics of an entire song, that isn’t just data mining, it is in fact copying.

So what the court says is: Under current EU law, you’re allowed to copy as much data as you want for *training* your LLM, but that doesn’t mean you’re also allowed to actually provide LLM as a service to the public. (IANAL)

Note that this particular ruling is not legal precedent and it’s already being appealed.

Show thread

ell1e coding things 11h ago

@ajnn @traecer You say IANAL, but the lawyer in the clip seems to be a lawyer. Beyond that, I don't have much to say.

Show thread

Arne 🌻🌈10h ago

@ell1e @traecer I mean, the article you cite doesn’t even mention the words “fair use”? Just because it’s what *you* are familiar with, doesn’t mean it’s a thing anywhere else.

Show thread

Cassandrich 1d ago

@srvanderplas @jonny Yes, they do, and they have to follow the terms of the GPL strictly. Which means documenting the date and nature of each change from the code they derived it from, and who made those changes. Something which they're not going to be able to do. In which case, any use of the LLM at all is infringing.

Show thread

jonny (good kind)1d ago

@dalias
@srvanderplas
This true if you exist in the realm of "the law" like us mere mortals. however when you are in the domain of "the entire machinery of capital seeking total, final enclosure of reality" then a different set of rules seem to apply

Show thread

eestileib (she/hers)1d ago

@srvanderplas @jonny

Ethically? Absolutely 100%

Legally? Well, you see, the tech CEOs are very good friends with all three branches of the US government, so not in the USA or Israel anyway.

Show thread

mirabilos 18h ago

@srvanderplas @jonny of course they do