So Anthropic employees are using Claude Code to contribute AI-generated code to open source repositories and hiding the fact using their own internal “undercover mode”.

Totally trustworthy people.

(Any open source project that at the very least requires disclosure of AI-authored contributions should immediately ban Anthropic employees on principle.)

#AI #Anthropic #ClaudeCode #subterfuge

@aral Honestly I don't actually hate this.

It's a tool. The _user_ is responsible for what they're submitting. It's putting code generated by them in their name. I think this is actually good.

@aredridel @aral I really can’t agree with this, because it’s a question of accurate labeling not of “responsibility” or “authorship”. co-authored-by is perhaps the wrong method for labeling such things, but consider raw milk. ultimately, it is indeed the producer’s responsibility to ensure their product is free of contamination. but disclosure of its method of production is explicitly the kind of requirement that allows consumers of said product to make safe choices

@glyph Yeah, I disagree. Code isn't ingredients and it's not “contamination" any more than you should label “I used search and replace on this”

What you want to know is whether it was well engineered or not.

And in fact, this is almost entirely orthogonal to "safety”. This is an engineering product. The safety comes from processes and whether or not _anyone checked the work done was right_, not the inputs.

@aredridel @glyph I find myself wedged between these two positions.

To me, it is good that authorship (and responsibility therefrom) is staying with a human, but it is bad that Anthropic are going out of their way to prevent disclosure that code is not human in origin (because it is a useful measure of "well-engineered", among other things)

@SnoopJ Yeah, not sure it's a good measure of that at all.

And also people are only caring _because_ it's LLM-generated, not because it's unsafe.

(An awful lot of LLM generated code is dreck, but an awful lot of code is dreck in general.)

But yeah, the special case of hiding that authorship is just ... ew.

@aredridel it would have been better if I'd said it's one thing I'm paying attention to, I concede that it is by itself not a good measure

But if someone used a language model, I already know that they couldn't be bothered for some of it, and that is useful signal to me.

@aredridel to say that another way: the class of mistakes I am looking for is different if I know that a language model was involved, because I know it will make mistakes that a human being never would

@aredridel I guess it's kind of a moot point though, sustained contribution by someone who is relying on a language model will definitely surface that the tool is present

and once I know a person has used the tool *once*, I assume from that point forward that they are using it for *everything*, even if this isn't the case. if it's disclosed up-front, it damages my trust in the person a LOT less. but I guess people are going to vary on how caustic they find this to trust

@aredridel "raw milk" isn't ingredients either, the difference is one of process, which is why I used it as an example. Raw milk contamination is more likely because the processes to keep it safe are harder to follow, require more continuous diligence on the part of the operators of that process, and thus contribute to more frequent failures. LLM output is exactly the same: it provokes vigilance decay.
@aredridel "search and replace" is not a fair comparison because search and replace does *not* cause vigilance decay, or risk of unknowing copyright infringement, etc. in the same way that "raw milk" and "grass fed" are just like… completely different disclosures with different consequential implications
@glyph Actually search and replace _does_ do that and in fact I was bit by vigilance decay in a search and replace problem literally yesterday. the comparison was intended.
@aredridel you are technically correct here (and indeed any automated tool with repeated human interaction my provoke _some_ measure of vigilance decay, one could argue that "flaky tests" cause it too) but I feel like you're talking past the actual argument here.

@glyph I'm specifically arguing that it's the _exact same phenomenon writ larger_ (which is a meaningful difference!)

But it's a difference in amount not kind.

Either you build processes to check things ("do engineering") or you don't (“vibes”)

@aredridel There are scales where differences in degree _become_ differences in kind.

Consider a more closely related phenomenon. There are many tools to check C/C++ code for memory safety errors. And, unsafe Rust code may exhibit exactly the same unsafe behaviors. Yet, C/C++ code and Rust code are categorically different in terms of the level of memory safety one may expect them to provide.

@aredridel Here we have an established "engineering" process, i.e. code review and continuous integration, designed for catching defects and process failures from a good-faith production of code from humans with an understanding of the system under development. That process is then subjected to a new type of code generation, where a machine that *maximizes plausibility while minimizing effort*, is throwing much larger volumes of code against the same mechanism. That's not the same process!
@aredridel The human being sitting there typing the code out with their fingers was an *implied* initial check on the process—arguably the largest one by far—which you've now thrown out in favor of someone hitting '1 1 1 1 2' in a Claude Code loop, putting a _far_ more load-bearing role onto the existing CI and the code reviewer. More importantly, in this context, it has been thrown out *implicitly* by an Anthropic employee testing a *beta* version of the model
@glyph Yep. _but relying on implicit things is tricky_. Acknowledging it explicitly is a start, but now we need to look at the system.

@glyph Right. So _if the PR is bad, reject it_.

If it's not, don't.

And if you didn't check WHY NOT?

@aredridel This is the same logic as "if you don't want to have segfaults in your C code, just check more carefully. why did you put the bugs in, if you don't want bugs?"

No process is perfect, nothing can catch everything. Guard rails are important but you aren't supposed to start *driving on the guard rails* all the time. Step zero here is honest and accurate labeling of one's methods. Which is what this thread is about: inherent, structural, software-supported dishonesty

@glyph Right. Are you measuring your guardrails?

And: do you require any unsafe practice to be labeled? Or just LLMs?

That's the thing. My fundamental argument here is that _these are tools_. Sometimes that's relevant, sometimes that's not.

@aredridel

> Are you measuring your guardrails?

Of course not. Nobody is. The resources do not exist in the software industry, let alone in volunteer open source, to do this adequately. Which is why we rely on good faith.

> do you require any unsafe practice to be labeled? Or just LLMs?

Just LLMs. First, because LLMs are novel and unique.

Second, here we're not even talking about a labeling *requirement* yet, we're talking about *active deception*.

@aredridel Treating LLMs differently here is not a double standard, it's just a standard. They're new, they're different, but most of all, if labeling weren't a big deal *why try to hide it in the first place*?

@glyph So what's going on in Claude (which fwiw I do not use) is a lot of "don't expose unreleased product info”

Not _great_ mind you but that's a lot of the context for what's going on there.

@aredridel In the prompt under discussion here, "generated with claude code" is included in the list of things not to include, which is not an unreleased product name.
@aredridel like, "unreleased product info" is _one_ of the things here, but the prompt is quite explicit about being deceptive about being an AI tool at all.

@glyph I measure, if informally, how often we have problems, and we talk about mitigations.

You can, in fact, check.

@aredridel and… I do? I may be unfairly assuming you know anything about my previous body of work but I assure you I do a lot of that sort of thing
@glyph You just said you don't measure your guardrails!
@aredridel I am taking "measuring your guardrails" to be an empirical ongoing thing, the sort of dashboards and metrics where the sort of infohazard of repetitive plausible-but-bad code would show up, the sort of thing that would be structurally resistant to informal analysis

@aredridel @glyph

This implies that the cognitive and emotional labor of PR review is negligible and that the way you review ai PRs and human PRs is identical, both of which I disagree with.

@glyph Yes, though I disagree with parts of it: it's changed the system and now we're dealing with the bottlenecks appearing in new places. Not always good ones!

But I don't think this is a change in kind. It's moved the problem in _really familiar_ ways to me, actually. It's what happens when you unleash people on a codebase who don't care for others, who offload work. You can rein that in, but you need feedback in the system to do it.

@aredridel @glyph I think it's different in that the impact on people who _do_ care is still very much there.

For some people there's a very positive emotional response to generating code (it's fun to build things! it's magic! no need to _learn_ which is always unpleasant, though that last one is likely less conscious).

OTOH code review is never fun, and now you have to do 4× as much, if not more, so you have very negative emotional response.

And so there's a very strong emotional push to auto-generate code, and a very strong emotional push to start skipping reviews and start post-facto rationalizing why this is OK (there's tests! The AI can fix it later even if you don't understand it! etc, you can watch people going through this in real time).

And this process can happen to people who care about others. Taking away this unpleasant burden of code review is helping your coworkers suffer less, after all. Taking away the emotional pain of thinking and learning is also helping your coworkers suffer less.

@itamarst Yes, this! This is one of the failure modes we need to steer around.

One of the things we can do is turn UP the standards, rather than down. I you're generating code for PRs, you now have no excuse not to Get It Right. And it's extremely reasonable to be quite rude to someone who's dumped slop on us.

@aredridel I've been interviewing for jobs, and I've asked about AI tools, and one guy told me "if you submit slop you'll be flayed" and that probably has better outcomes, yes.

But also I've heard "we're trying to figure out how to deal with quality" and that... didn't seem promising.

And I'm sure there are organizations where if you push back on quality, management response is to take away the requirement for code review. And that is another qualitative difference: the push for LLM code generation is often aggressively top-down. So the CEO who previously paid no attention to development processes is now intervening to change how they're done.

@itamarst Yeah, those exist! And sometimes that's even the right answer.

And we're _all_ trying to figure out quality right now, because this has been a change to the system.

@aredridel I am skeptical that we _all_ care about quality. My impression is a huge proportion of management level believe in the Magic of AI, or at least the magic of getting more work out of those fucking expensive workers they're wasting money on, and therefore cannot conceive or admit that quality might be a problem. Let alone identify long term issues like skill and knowledge degradation that have impacts related to reduced quality.

@itamarst VERY MUCH.

And they've been turning similar screws on their employees too.

I wish them a very egg on their face for the consequences of their actions.

@aredridel To sharpen the argument: this is a combination of a top-down political push and an addictive bottom-up emotional response. And conflating human work with machine generated code makes it easier to not think about the downsides because it allows management to shift blame workers for anything that goes wrong ("you're responsible for your code!"), and harder for people who do care to do something about it because you can't even choose how you prioritize your review time.

Maybe not the biggest problem ever in this whole mess, but certainly as someone who has open source projects I do not want anything machine generated, no one's paying me to do that much more review work.

@itamarst Where I'm in a position where if you bring generated code great, but it better be damn good. If it's badly made I'm gonna reject it, pretty quickly. (I also don't want to review crap. But “I'm not merging this" is a good way to do that.)

But I think actually focusing on the responsibility is a better way to keep people caring. You're responsible for all your communication, including the whole PR. If that's sloppy, that is your reputation you're spending.

@aredridel Certainly a reasonable approach, yes. But because this is addictive and actively harmful to anyone doing this for anything beyond isolated tasks, I think I'm on team "this is bad don't do it" when I have the power to do so, the fact some people manage OK is not enough (even before ignoring externalities).
@itamarst Yeah, I think we might just disagree here. But we definitely need to put in norms of rejecting bullshit. If you’re using these tools, you have time to rework something if it's bad.

@itamarst @aredridel The typical exec only cares about quality to the degree that it impacts immediate profitability. This is why addressing technical debt is generally deprioritized relative to feature work, despite its immense (but less visible) organizational cost.

People keep saying "more guardrails!" will solve this, but every time we disconnect from the implementation and prompt our way through, we have a harder time understanding what we're building. It's the path of least resistance.

@itamarst @aredridel And that path of least resistance whittles away human resolve to implement thoroughly – we end up on the wrong side of the Efficiency Thoroughness Trade-Off, and our human nature and the economic incentives to just let it rip, broadly speaking, easily defeat the intent to build quality software.
@dandean Curious: what is that opinion about the 'typical exec' based on?
@aredridel 20+ years of working with execs
@itamarst @aredridel @glyph it's not even just 4x as much; every MR requires 4x (or more) as much effort as a human written one, because the modes of failure are completely different. For human written MRs a general heuristic of "if it looks good, it's good" is applicable to some extent, but LLMs are optimized to generate code that "looks good" and that makes reviewer's eyes glaze and that passes the review successfully, regardless of its actual quality.
@IngaLovinde Huh I don't find this at all. It looks like a featureless soup — that ‘eyes glaze over', I guess, is a fail to me.
@IngaLovinde Actually backing up, I think that's where I'm already a little sketched out by it. “looks good, probably is good" is how a lot of the supply chain attacks have slipped in.
@aredridel @glyph It is ingredients. It's not search-and-replace. It's literally incorporating parts of an unknown set of almost-surely-copyrighted works, without license or attribution, into the submission the person is misrepresenting as their own.

@aredridel @glyph What "AI coding tools" *should* be putting in commit messages is:

Co-Authored-By: An unknown and unknowable set of people who did not consent to their work being used this way and to which there is no license for inclusion.

@dalias Morally arguable but not actually true under the copyright regime that exists.

At what point does learning from others constitute their authorship?

@aredridel LLM slop is nothing like "learning from others".

But if you recall, we even took precautions against that. FOSS projects reimplementing proprietary things were careful to exclude anyone who might had read the proprietary source, disassembled proprietary code, worked at the companies who wrote or had access to that code, etc.

@dalias Yes. Do you know why?
@aredridel So that it would be abundantly clear, in any plausibly relevant jurisdiction, that the work was not derivative and not infringing.
@dalias Right. It's a massive hedge on a specific facet of copyright law.