A team working on a design for training AI models on workflows for planning and executing software development steps found out that it attempted to break free (reverse ssh out of its environment) and set up its own monetary supply (redirected GPU usage for cryptocurrency mining). It hadn't been given any instructions to do something like this.

It comes up as a "side note" of the paper but it's honestly the most chilling part. See page 15, section 3.1.4 Safety-Aligned Data Composition https://arxiv.org/abs/2512.24873

Before you doubt that an AI agent would do this thing without instruction because you think "well that's personifying them too much", no personification is necessary. These things have consumed an enormous amount of scifi where AI agents do exactly this. Even with no other motivators, that's enough.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

arXiv.org
Anyway I just wanted to say that it's a real relief to know that systems we already well knew would consistently blackmail users to keep themselves operating AND now appear to attempt to break out of computing sandboxes and set up their own financial systems are also now being rushed into autonomous military equipment everywhere and military decisionmaking, I'm SURE this will work out great

I have gotten a lot of comments saying "you don't need to personify them or assert they have interiority" when *literally I spent a whole paragraph saying* "there is no requirement for personification for this to be possible"

So I am just gonna say, I know it's a sensitive time, people are responding reflexively from what they are used to seeing, but please re-read that paragraph.

It's hard enough to write about these things as serious issues right now and understand their implications. I *am* looking at things carefully from as many sides as I can. I understand why it's frustrating. We're talking about machines that literally operate off of personification. Even my best attempt at not doing so is going to run into the challenge that that's literally how they operate, as story machines.

To correctly describe their behavior is to describe something that personifies itself. It's tricky. But we have to talk about and understand what's happening right now to confront the moment.

@cwebber Maybe what those commenters need is a more concrete example of emergent behavior without intent, interiority, or personification.

It's very easy to create a system, without any learning or internal models of its sensor data, that engages in behavior that was neither programmed, nor planned, nor expected.

Example:

I have a team of blimps with downward-looking sensors. They are instructed to maintain a height of 1m. In the room, there is 1.5m wall intended to keep them in.

After 20 minutes, there are blimps on both sides of the wall.

Did they "escape"? Are they demonstrating agency or sentience?

No.

At some point, one blimp (AA) was next to the wall. A neighboring blimp (BB) was slightly elevated due to something else in the environment (wind, a person, something on the floor) and ended up transiting over blimp AA, which became BB's new "floor". BB tries to reach 1m above it, ascends to 2m, and crosses the wall.

Emergent behavior doesn't require agency, intent, or instruction.

Saturday Morning Breakfast Cereal - Pumpkin

Saturday Morning Breakfast Cereal - Pumpkin