RE: https://mastodon.social/@campuscodi/116154291574332497

> We're entering an era where AI agents attack other AI agents. In this campaign, an AI-powered bot tried to manipulate an AI code reviewer into committing malicious code. The attack surface for software supply chains just got a lot wider.

Several interesting attacks in this one. What's curious is that each malicious PR discussed used a different attack.

A lot of them are injection attacks. But my favorite of all of them: rewrote CLAUDE.md so the reviewing agent took on different directives. That attack kinda rules ngl

In its defense, the reviewing Claude agent identified the attack correctly and rejected it
However, I suspect we'll see more and more attacks like this going forward. The CLAUDE.md attack is basically a Thompson attack but for agents instead of compilers.
@cwebber i guess that's the closest real-world equivalent we have to, like, using a Netrunner hack in Cyberpunk 2077 to tell someone to fuck off lol

@cwebber i love it. Let it get worse and worse πŸ˜»πŸ™.

I'm surprised people are not trying more actively to break it all by writing malicious instructions everywhere that it reads from, like commit messages, comments in code, stuff in weird files like gitignore or random blobs of obfuscated js files for example 

Anyway, I guess any bot is now a problem and will get really fun from now on.

@cwebber
Makes complete sense considering the risk asymmetry between attacker and defender: the cost of failure for the attacker is negligible, they just have to go to the next target. It's as if AI was designed to do exactly that πŸ€”

@cwebber On the other hand, the successful attacks listed in this article were carried out against CI scripts not sanitizing inputs, and allowing PR changes to scripts to be executed, and too much network access for the scriptsβ€”all of which did not include an AI coding agent on the repository side. The one attack that tried to fool the AI PR agent was the one that failed. So one could argue that the AI PR agent was the one that performed the best against this attack!

NB I’m *not* advocating for AI PR agents. Far from it. I think they should be treated with suspicion.