Mastodawn

RE: https://mastodon.social/@campuscodi/116154291574332497

> We're entering an era where AI agents attack other AI agents. In this campaign, an AI-powered bot tried to manipulate an AI code reviewer into committing malicious code. The attack surface for software supply chains just got a lot wider.

Show thread

Christine Lemmer-Webber Mar 1

Several interesting attacks in this one. What's curious is that each malicious PR discussed used a different attack.

A lot of them are injection attacks. But my favorite of all of them: rewrote CLAUDE.md so the reviewing agent took on different directives. That attack kinda rules ngl

Show thread

Christine Lemmer-Webber Mar 1

In its defense, the reviewing Claude agent identified the attack correctly and rejected it

Show thread

Christine Lemmer-Webber Mar 1

However, I suspect we'll see more and more attacks like this going forward. The CLAUDE.md attack is basically a Thompson attack but for agents instead of compilers.

Show thread

damien 🥖🐈‍⬛🧣Mar 1

@cwebber i guess that's the closest real-world equivalent we have to, like, using a Netrunner hack in Cyberpunk 2077 to tell someone to fuck off lol

Show thread

vascorsd Mar 1

@cwebber i love it. Let it get worse and worse 😻🙏.

I'm surprised people are not trying more actively to break it all by writing malicious instructions everywhere that it reads from, like commit messages, comments in code, stuff in weird files like gitignore or random blobs of obfuscated js files for example

Anyway, I guess any bot is now a problem and will get really fun from now on.

Show thread

John Francis 🇨🇦🦫🍁💪⬆️Mar 1

@cwebber sounds expensive

Show thread

Bruno Girin Mar 1

@cwebber
Makes complete sense considering the risk asymmetry between attacker and defender: the cost of failure for the attacker is negligible, they just have to go to the next target. It's as if AI was designed to do exactly that 🤔

@cwebber
Grey Goo.

@cwebber On the other hand, the successful attacks listed in this article were carried out against CI scripts not sanitizing inputs, and allowing PR changes to scripts to be executed, and too much network access for the scripts—all of which did not include an AI coding agent on the repository side. The one attack that tried to fool the AI PR agent was the one that failed. So one could argue that the AI PR agent was the one that performed the best against this attack!

NB I’m *not* advocating for AI PR agents. Far from it. I think they should be treated with suspicion.