Mastodawn

I am convinced we are on the verge of the first "AI agent worm". This looks like the closest hint of it, though it isn't it quite itself: an attack on a PR agent that got it to set up to install openclaw with full access on 4k machines https://grith.ai/blog/clinejection-when-your-ai-tool-installs-another

But, the agents installed weren't given instructions to *do* anything yet.

Soon they will be. And when they are, the havoc will be massive. Unlike traditional worms, where you're looking for the typically byte-for-byte identical worm embedded in the system, an agent worm can do different, nondeterministic things on every install, and carry out a global action.

I suspect we're months away from seeing the first agent worm, *if* that. There may already be some happening right now in FOSS projects, undetected.

A GitHub Issue Title Compromised 4,000 Developer Machines

A prompt injection in a GitHub issue triggered a chain reaction that ended with 4,000 developers getting OpenClaw installed without consent. The attack composes well-understood vulnerabilities into something new: one AI tool bootstrapping another.

Show thread

Christine Lemmer-Webber Mar 5

I wrote a blogpost on this: "The first AI agent worm is months away, if that" https://dustycloud.org/blog/the-first-ai-agent-worm-is-months-away-if-that/

People who are using LLM agents for their coding, review systems, etc will probably be the first ones hit. But once agents start installing agents into other systems, we could be off to the races.

The first AI agent worm is months away, if that -- Dustycloud Brainstorms

Show thread

Christine Lemmer-Webber Mar 5

Here's another way to put it: if those using AI agents to codegen / review are the *initialization vectors*, we now also have a significant computing public health reason to discourage the use of these tools.

Not that I think it will. But I'm convinced this is how patient zero will happen.

Show thread

Christine Lemmer-Webber Mar 5

I know some people are thinking "well pulling off this kind of thing, it would have to be controlled with intent of a human actor"

It doesn't have to be.

1. A human could *kick off* such a process, and then it runs away from them.
2. It wouldn't even require a specific prompt to kick off a worm. There's enough scifi out there for this to be something any one of the barely-monitored openclaw agents could determine it should do.

Whether it's kicked off by a human explicitly or a stray agent, it doesn't require "intentionality". Biological viruses don't have interiority / intentionality, and yet are major threats that reproduce and adapt.

Show thread

vv 💫 [follow my new artist profile!]Mar 5

@cwebber what i think is interesting about this is the potential for it to get so out of control that they have to pull the plug on the entire agent service

Show thread

Christine Lemmer-Webber Mar 5

@vv Yeah. I mean, local models *might* be able to pull this off but right now Claude is the most likely candidate, it's the most capable. But even then, the most capable open model that is capable of doing such damage on its own is somewhere around a gigabyte, not a small download.

(But, people download huge things all the time, so not completely infeasible either.)

Show thread

Daniel Lyons Mar 5

@cwebber @vv If a local model is calling tools then it is still vulnerable to prompt injection.

Show thread

vv 💫 [follow my new artist profile!]

@dandylyons @cwebber for sure, but it still takes some level of ability to perform these tasks effectively, which local models, especially anything that can run on a typical machine, struggle with

Show thread

Daniel Lyons Mar 5

@vv @cwebber This is a good point. For now, local models are not proficient at tool calling. I don’t expect that to last for very long though.