Mastodawn

A team working on a design for training AI models on workflows for planning and executing software development steps found out that it attempted to break free (reverse ssh out of its environment) and set up its own monetary supply (redirected GPU usage for cryptocurrency mining). It hadn't been given any instructions to do something like this.

It comes up as a "side note" of the paper but it's honestly the most chilling part. See page 15, section 3.1.4 Safety-Aligned Data Composition https://arxiv.org/abs/2512.24873

Before you doubt that an AI agent would do this thing without instruction because you think "well that's personifying them too much", no personification is necessary. These things have consumed an enormous amount of scifi where AI agents do exactly this. Even with no other motivators, that's enough.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

arXiv.org

Show thread

Christine Lemmer-Webber Mar 7

Anyway I just wanted to say that it's a real relief to know that systems we already well knew would consistently blackmail users to keep themselves operating AND now appear to attempt to break out of computing sandboxes and set up their own financial systems are also now being rushed into autonomous military equipment everywhere and military decisionmaking, I'm SURE this will work out great

Show thread

Christine Lemmer-Webber Mar 7

I have gotten a lot of comments saying "you don't need to personify them or assert they have interiority" when *literally I spent a whole paragraph saying* "there is no requirement for personification for this to be possible"

So I am just gonna say, I know it's a sensitive time, people are responding reflexively from what they are used to seeing, but please re-read that paragraph.

It's hard enough to write about these things as serious issues right now and understand their implications. I *am* looking at things carefully from as many sides as I can. I understand why it's frustrating. We're talking about machines that literally operate off of personification. Even my best attempt at not doing so is going to run into the challenge that that's literally how they operate, as story machines.

To correctly describe their behavior is to describe something that personifies itself. It's tricky. But we have to talk about and understand what's happening right now to confront the moment.

Show thread

Hadley T. Canine (fox)

@cwebber So many people *love* to shut down all talk about things LLMs do as "but they're not sentient, so they CAN'T do that!" as if that somehow nullifies out all LLM behavior as impossible.