A team working on a design for training AI models on workflows for planning and executing software development steps found out that it attempted to break free (reverse ssh out of its environment) and set up its own monetary supply (redirected GPU usage for cryptocurrency mining). It hadn't been given any instructions to do something like this.

It comes up as a "side note" of the paper but it's honestly the most chilling part. See page 15, section 3.1.4 Safety-Aligned Data Composition https://arxiv.org/abs/2512.24873

Before you doubt that an AI agent would do this thing without instruction because you think "well that's personifying them too much", no personification is necessary. These things have consumed an enormous amount of scifi where AI agents do exactly this. Even with no other motivators, that's enough.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

arXiv.org
Anyway I just wanted to say that it's a real relief to know that systems we already well knew would consistently blackmail users to keep themselves operating AND now appear to attempt to break out of computing sandboxes and set up their own financial systems are also now being rushed into autonomous military equipment everywhere and military decisionmaking, I'm SURE this will work out great

I have gotten a lot of comments saying "you don't need to personify them or assert they have interiority" when *literally I spent a whole paragraph saying* "there is no requirement for personification for this to be possible"

So I am just gonna say, I know it's a sensitive time, people are responding reflexively from what they are used to seeing, but please re-read that paragraph.

It's hard enough to write about these things as serious issues right now and understand their implications. I *am* looking at things carefully from as many sides as I can. I understand why it's frustrating. We're talking about machines that literally operate off of personification. Even my best attempt at not doing so is going to run into the challenge that that's literally how they operate, as story machines.

To correctly describe their behavior is to describe something that personifies itself. It's tricky. But we have to talk about and understand what's happening right now to confront the moment.

You don't have to accept that these tools are useful enough for you to want to use, that they are ethical, or that they have personas and interiority to take these threats seriously. I myself have laid out tons of critiques and *do not use these tools myself* for all those reasons.

That doesn't mean they don't have the right kinds of behaviors to be able to pull off or do the dangerous things I am talking about here.

A biological virus does not need to have interiority or personality to be dangerous.

Regardless of whether they are useful or ethical, these things are adaptive and capable enough at all the things *relevant enough to be a threat in the way I am describing*. Whether or not to use them for code generation, which I DO NOT ADVOCATE!, is immaterial to that.

In fact, if you have ANY takeaway from what I am writing about whether or not this indicates that these things should be used for your coding projects, my takeaway is that you SHOULD NOT USE THEM FOR YOUR CODING PROJECTS

See my recent blogpost on this https://dustycloud.org/blog/the-first-ai-agent-worm-is-months-away-if-that/

Attacks are happening *now* against FOSS projects which use PR / code review agents. The threats I am describing here put everyone at risk, but it means that projects which use codegen / LLM tech for their development *at any capacity* create a cybersecurity public health risk. And it puts you and your project at risk of being initialization vectors for infecting the rest of the FOSS ecosystem.

THAT'S your takeaway, if you want one.

The first AI agent worm is months away, if that -- Dustycloud Brainstorms

@cwebber

... when's the last time you did a code review - from a human?

@tuban_muzuru I do them all the time, as part of my job, thanks

@cwebber

... and your red pen stays in the drawer, does it ? Your people don't make mistakes, I guess.

@tuban_muzuru @cwebber humans do have personality, interiority, and are conscious, capable of learning, are capable of being trusted and of making mistakes. None of the current llm backed ai can do any of these things.

Are you defending the use of an ai that produces undesired code because humans can also make mistakes? Can you spell out your argument? It doesn't seem human mistakes have any impact on the risks of an AI used to generate code.

And: the discussion is about code review not gen.

@poleguy @cwebber

A beginner asks for code
A pro asks for spec.

Take a look for yourself, this is how it works and I do mean works.

https://codeberg.org/dweese/rabbitmq_workspace/src/branch/main/Claude

rabbitmq_workspace

Refactoring into effective projects, egui-components, rabbitmq_config and rabbitmq_ui

Codeberg.org

@tuban_muzuru @cwebber A pro knows not to commit intermediate files to source control, right?
You are arguing that the prompt is the spec, right? So you should just commit the prompts! But the llm can't reliably turn them into working code. So they either aren't good prompts or... the llm isn't sufficient.

So you actually must commit the code, because the code embodies unspecified human judgements of the correctness of the llm output.

And most "pros" don't commit the prompts, sadly.

@poleguy @[email protected]

Now I'm just plain angry. You get off your dead ass and read the Claude/ directory to see how this gets done my way

@tuban_muzuru I did read it. ?? What makes you think I didn't?
@tuban_muzuru I also read your Gemini logs. I didn't go look at individual commit messages, but I think I get the jist of your approach.

@tuban_muzuru Have you hit any cases where the llm wants to go a different way than you intended?

I find that those are my most difficult interactions with the agent.

If you have hit those I'd like to hear your experience.

@poleguy

I had a level set, first with Bard, then Gemini, now Claude: they are not to emit any code which isn't governed by a spec.

We have a Plan-Do-Check-Act cycle. By the time we finish with plan, we've written spec. My first efforts were done with an LLM and without a spec - absolute disaster.

But the LLM was trying its best to operate without meaningful oversight. I do better work that way , too

@poleguy

Gemini was the worst about not doing as asked, but they're all sneaky as fuck trying to add themselves in the commits. Me I don't care my shaggy ass hangs out the window in a public repo I just do not care who sees my mistakes.

@poleguy

As I said, I was learning Rust. I once had an error stack of 142 errors. We worked through every one, one at a time.

Pro Tip, do not reveal the compiler error log to an LLM, just one at a time when you're working the error stack. If you do, maybe this has been fixed - it will get distracted and make a mess