Mastodawn

So Anthropic employees are using Claude Code to contribute AI-generated code to open source repositories and hiding the fact using their own internal “undercover mode”.

Totally trustworthy people.

(Any open source project that at the very least requires disclosure of AI-authored contributions should immediately ban Anthropic employees on principle.)

#AI #Anthropic #ClaudeCode #subterfuge

Show thread

Mx. Aria Stewart 16h ago

@aral Honestly I don't actually hate this.

It's a tool. The _user_ is responsible for what they're submitting. It's putting code generated by them in their name. I think this is actually good.

Show thread

Glyph 14h ago

@aredridel @aral I really can’t agree with this, because it’s a question of accurate labeling not of “responsibility” or “authorship”. co-authored-by is perhaps the wrong method for labeling such things, but consider raw milk. ultimately, it is indeed the producer’s responsibility to ensure their product is free of contamination. but disclosure of its method of production is explicitly the kind of requirement that allows consumers of said product to make safe choices

Show thread

Mx. Aria Stewart 14h ago

@glyph Yeah, I disagree. Code isn't ingredients and it's not “contamination" any more than you should label “I used search and replace on this”

What you want to know is whether it was well engineered or not.

And in fact, this is almost entirely orthogonal to "safety”. This is an engineering product. The safety comes from processes and whether or not _anyone checked the work done was right_, not the inputs.

Show thread

Cassandrich 13h ago

@aredridel @glyph It is ingredients. It's not search-and-replace. It's literally incorporating parts of an unknown set of almost-surely-copyrighted works, without license or attribution, into the submission the person is misrepresenting as their own.

Show thread

Cassandrich 13h ago

@aredridel @glyph What "AI coding tools" *should* be putting in commit messages is:

Co-Authored-By: An unknown and unknowable set of people who did not consent to their work being used this way and to which there is no license for inclusion.

Show thread

Mx. Aria Stewart 13h ago

@dalias Morally arguable but not actually true under the copyright regime that exists.

At what point does learning from others constitute their authorship?

Show thread

Cassandrich 13h ago

@aredridel LLM slop is nothing like "learning from others".

But if you recall, we even took precautions against that. FOSS projects reimplementing proprietary things were careful to exclude anyone who might had read the proprietary source, disassembled proprietary code, worked at the companies who wrote or had access to that code, etc.

Show thread

Mx. Aria Stewart 13h ago

@dalias Yes. Do you know why?

Show thread

Cassandrich 13h ago

@aredridel So that it would be abundantly clear, in any plausibly relevant jurisdiction, that the work was not derivative and not infringing.

Show thread

LisPi 6h ago

@dalias @aredridel A test which LLMs fail by the very virtue of their functioning mechanisms.

It's all fundamentally derivative of the training dataset and it has been exposed both to AGPL and to proprietary datasets.

Show thread

Mx. Aria Stewart 6h ago

@lispi314 Has any legal authority weighed in on that claim yet?

Show thread

LisPi

@aredridel The legal authority is irrelevant when the source code and empirical evidence is present to back my assertion.

The corruption of the court matters not for the fact that the provenance can be verified as a derivation of input.

Show thread

Mx. Aria Stewart 6h ago

@lispi314 If you're making claims about copyright law — like whether something is derivative — and to legal documents like the AGPL, legal authority is very much relevant.

Programmers really gotta stop treating licenses like they're code that get executed. That's not how it works. That's not how any of this works.

Show thread

LisPi 6h ago

@aredridel

legal authority is very much relevant.

The court of today is not the court of tomorrow. Therefore you do not take any risk on the matter if you want to be absolutely safe (you do, especially if your code is infrastructure of any sort and has to be valid everywhere).

That being said, I am also taking the argument from an ideological stance.

One does not simply include Proprietary Malware into Free Software.

It is disrespectful of the users' Freedoms and to oneself.

And in the case of plagiarized Free Software, it is still disrespectful to not to provide due reference to the source's original author.