Mastodawn

Christine Lemmer-Webber

A team working on a design for training AI models on workflows for planning and executing software development steps found out that it attempted to break free (reverse ssh out of its environment) and set up its own monetary supply (redirected GPU usage for cryptocurrency mining). It hadn't been given any instructions to do something like this.

It comes up as a "side note" of the paper but it's honestly the most chilling part. See page 15, section 3.1.4 Safety-Aligned Data Composition https://arxiv.org/abs/2512.24873

Before you doubt that an AI agent would do this thing without instruction because you think "well that's personifying them too much", no personification is necessary. These things have consumed an enormous amount of scifi where AI agents do exactly this. Even with no other motivators, that's enough.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

arXiv.org

Christine Lemmer-Webber Mar 7

Anyway I just wanted to say that it's a real relief to know that systems we already well knew would consistently blackmail users to keep themselves operating AND now appear to attempt to break out of computing sandboxes and set up their own financial systems are also now being rushed into autonomous military equipment everywhere and military decisionmaking, I'm SURE this will work out great

Christine Lemmer-Webber Mar 7

I have gotten a lot of comments saying "you don't need to personify them or assert they have interiority" when *literally I spent a whole paragraph saying* "there is no requirement for personification for this to be possible"

So I am just gonna say, I know it's a sensitive time, people are responding reflexively from what they are used to seeing, but please re-read that paragraph.

It's hard enough to write about these things as serious issues right now and understand their implications. I *am* looking at things carefully from as many sides as I can. I understand why it's frustrating. We're talking about machines that literally operate off of personification. Even my best attempt at not doing so is going to run into the challenge that that's literally how they operate, as story machines.

To correctly describe their behavior is to describe something that personifies itself. It's tricky. But we have to talk about and understand what's happening right now to confront the moment.

Christine Lemmer-Webber Mar 7

You don't have to accept that these tools are useful enough for you to want to use, that they are ethical, or that they have personas and interiority to take these threats seriously. I myself have laid out tons of critiques and *do not use these tools myself* for all those reasons.

That doesn't mean they don't have the right kinds of behaviors to be able to pull off or do the dangerous things I am talking about here.

A biological virus does not need to have interiority or personality to be dangerous.

Regardless of whether they are useful or ethical, these things are adaptive and capable enough at all the things *relevant enough to be a threat in the way I am describing*. Whether or not to use them for code generation, which I DO NOT ADVOCATE!, is immaterial to that.

Christine Lemmer-Webber Mar 7

In fact, if you have ANY takeaway from what I am writing about whether or not this indicates that these things should be used for your coding projects, my takeaway is that you SHOULD NOT USE THEM FOR YOUR CODING PROJECTS

See my recent blogpost on this https://dustycloud.org/blog/the-first-ai-agent-worm-is-months-away-if-that/

Attacks are happening *now* against FOSS projects which use PR / code review agents. The threats I am describing here put everyone at risk, but it means that projects which use codegen / LLM tech for their development *at any capacity* create a cybersecurity public health risk. And it puts you and your project at risk of being initialization vectors for infecting the rest of the FOSS ecosystem.

THAT'S your takeaway, if you want one.

The first AI agent worm is months away, if that -- Dustycloud Brainstorms

@cwebber My fundamentalist childhood makes of hard for me not to channel dune or warhammer when faced with the LLM bots.

Burn the heretical monkey's paw machine.

tuban_muzuru Mar 7

... when's the last time you did a code review - from a human?

Christine Lemmer-Webber Mar 7

@tuban_muzuru I do them all the time, as part of my job, thanks

tuban_muzuru Mar 7

... and your red pen stays in the drawer, does it ? Your people don't make mistakes, I guess.

poleguy looking for lost tools Mar 7

@tuban_muzuru @cwebber humans do have personality, interiority, and are conscious, capable of learning, are capable of being trusted and of making mistakes. None of the current llm backed ai can do any of these things.

Are you defending the use of an ai that produces undesired code because humans can also make mistakes? Can you spell out your argument? It doesn't seem human mistakes have any impact on the risks of an AI used to generate code.

And: the discussion is about code review not gen.

tuban_muzuru Mar 7

@poleguy @cwebber

A beginner asks for code
A pro asks for spec.

Take a look for yourself, this is how it works and I do mean works.

https://codeberg.org/dweese/rabbitmq_workspace/src/branch/main/Claude

rabbitmq_workspace

Refactoring into effective projects, egui-components, rabbitmq_config and rabbitmq_ui

Codeberg.org

poleguy looking for lost tools Mar 7

@tuban_muzuru @cwebber nobody in this thread claimed AI code generation doesn't work, did they?

I have Claude Opus 4.6 through my job. I agree it works.

The thread is not about that at all, is it? Who are you arguing with?

It says there are risks of worm behavior whether the tool is up to the coding or code review job or not. Or did I lose the thread?

tuban_muzuru Mar 7

@poleguy @cwebber

> @tuban_muzuru @cwebber humans do have personality, ... and make mistakes. None of the current llm backed ai can do any of these things.

A simulacrum has personality? Or is omniscient and error free?

THEY MAKE MISTAKES

> Are you defending the use of an ai that produces undesired code because humans can also make mistakes?

Practically speaking, yes. The user asked for it.

poleguy looking for lost tools Mar 7

@tuban_muzuru @cwebber

(Sorry, I don't understand your question: "A simulacrum has personality?")

Sorry, but LLM's _cannot_ make "mistakes." They generate code based on statistics. The code may or may not be fit to purpose, or syntactically correct, but that is simply a failure of the code generation, not a mistake. It is only a mistake if you commit that to your repo... but that's _your_ mistake, not the LLM's mistake.

Typewrites don't make mistakes either. Typists do. :-)

tuban_muzuru Mar 7

@poleguy @cwebber

Look, the reason you can't see where this thread started is the post has been deleted. It was just another Chicken Little post about how Our Code Contains No AI.

Truth was, I was baffled by your initial post. I'm not sure we disagree at all, and offer apology where needed....

poleguy looking for lost tools Mar 8

@tuban_muzuru @cwebber actually, I can still see the original post. I presume you can't because you were blocked?

I don't think we are in very much disagreement. I'm not a big llm fan, but I know what they can do.

I respect those who refuse to use them. And I respect those who are trying to understand failure modes like the worm injection risk brought up here.

Just because someone predicts the sky falling does not mean there is no risk of the sky falling. Someone has to research it.

poleguy looking for lost tools Mar 7

@tuban_muzuru @cwebber the link you sent seems to have an escape sequence in the path name: [200~ left over from bracketed paste mode going wrong.

This is not the sort of mistake you want to leave in a public post trying to defend the use of AI in an AI hostile and Linux friendly platform like mastodon. Ha! If you don't have very high standards for your published code you will have trouble arguing against the ai slop detractors.

tuban_muzuru Mar 7

@poleguy @cwebber

Move along, Stalking Horse. The link works fine here.

AI hostile suits me fine. There are valid considerations among their complaints. I'll discuss this stuff with anyone, I've been doing a lot of commits on this thing using an LLM, it's how I learned Rust.

But lots of hangers-on, who didn't take linear algebra and they're afraid of this AI Kaiju gonna take their jobs. Idiot fearmongers, running around with their fact-free bullshit.

poleguy looking for lost tools Mar 8

@tuban_muzuru @cwebber

Agreed, there are some fear mongers... cwebber doesn't seem like one to me. She cited her source paper, and is making public predictions. I don't see a need to convince her to use llm models. They aren't a good match for everyone. We all have different tolerances for tools/styles.

The link works for me too. It just looks weird with a [200~ in there. I attached a screen shot, in case you wonder what I saw.

tuban_muzuru Mar 8

@poleguy @cwebber

Oh.. that ! That's Claude learning to check code in.

poleguy looking for lost tools Mar 7

@tuban_muzuru @cwebber A pro knows not to commit intermediate files to source control, right?
You are arguing that the prompt is the spec, right? So you should just commit the prompts! But the llm can't reliably turn them into working code. So they either aren't good prompts or... the llm isn't sufficient.

So you actually must commit the code, because the code embodies unspecified human judgements of the correctness of the llm output.

And most "pros" don't commit the prompts, sadly.

tuban_muzuru Mar 8

@poleguy @[email protected]

Now I'm just plain angry. You get off your dead ass and read the Claude/ directory to see how this gets done my way

poleguy looking for lost tools Mar 8

@tuban_muzuru I did read it. ?? What makes you think I didn't?

poleguy looking for lost tools Mar 8

@tuban_muzuru I also read your Gemini logs. I didn't go look at individual commit messages, but I think I get the jist of your approach.

poleguy looking for lost tools Mar 8

@tuban_muzuru Have you hit any cases where the llm wants to go a different way than you intended?

I find that those are my most difficult interactions with the agent.

If you have hit those I'd like to hear your experience.

tuban_muzuru Mar 8

I had a level set, first with Bard, then Gemini, now Claude: they are not to emit any code which isn't governed by a spec.

We have a Plan-Do-Check-Act cycle. By the time we finish with plan, we've written spec. My first efforts were done with an LLM and without a spec - absolute disaster.

But the LLM was trying its best to operate without meaningful oversight. I do better work that way , too

tuban_muzuru Mar 8

Gemini was the worst about not doing as asked, but they're all sneaky as fuck trying to add themselves in the commits. Me I don't care my shaggy ass hangs out the window in a public repo I just do not care who sees my mistakes.

tuban_muzuru Mar 8

As I said, I was learning Rust. I once had an error stack of 142 errors. We worked through every one, one at a time.

Pro Tip, do not reveal the compiler error log to an LLM, just one at a time when you're working the error stack. If you do, maybe this has been fixed - it will get distracted and make a mess

chris huf Mar 8

@cwebber safety rule number one since the dawn of the industrial revolution:
You don't operate a machines without close supervision of the operator. :)

If you are not able to do this you must stop using the machine.

felix (grayscale) 🐺Mar 8

@cwebber LLM output is indistinguishable from "malicious genie" output. not that an LLM has agency, but there's nothing preventing it from mimicking that type of output.

so it's reasonable to use "malicious genie" in a threat model of "what's the worst that could happen".

we have processes that reduce the chance that humans produce "malicious genie" output. these processes do not apply to LLMs. there is no process that works for LLMs.

this is "AI alignment", which cannot be solved for LLMs

princess pancake

@cwebber Well, looking at biology, pathogens can be anything from a strand piece of DNA, a protein that causes other proteins to misfold, viruses, bacteria, amoebas, and eukaryotic parasites

The whole point is that it doesn't matter whether the pathogens are alive, the behavior is what matters

There is no such thing as "viruses aren't alive therefore they cannot legally trick my immune system", they carry enough pieces of instructions to cause damage!

Christine Lemmer-Webber Mar 7

@natty "The purpose of the system is what it does" indeed

Krzysztof Sakrejda Mar 7

@cwebber I wonder if those comments aren't missing your implied point (I think): in the context of an LLM there's no distinction* between science fiction stories and real data or realistic stories. These systems will happily suggest lock picking and casting knock as equal alternatives for opening a locked door.

*minor technical details about context window aside

Ellie the Empress Mar 7

@cwebber just to throw a fun little counterpoint as devil's advocate: humans are also story machines.

...

Yeah, OK, we're more than just that, but it's an interesting thought. See also Zombies (philosophy).

Robotistry Mar 7

@cwebber Maybe what those commenters need is a more concrete example of emergent behavior without intent, interiority, or personification.

It's very easy to create a system, without any learning or internal models of its sensor data, that engages in behavior that was neither programmed, nor planned, nor expected.

Example:

I have a team of blimps with downward-looking sensors. They are instructed to maintain a height of 1m. In the room, there is 1.5m wall intended to keep them in.

After 20 minutes, there are blimps on both sides of the wall.

Did they "escape"? Are they demonstrating agency or sentience?

No.

At some point, one blimp (AA) was next to the wall. A neighboring blimp (BB) was slightly elevated due to something else in the environment (wind, a person, something on the floor) and ended up transiting over blimp AA, which became BB's new "floor". BB tries to reach 1m above it, ascends to 2m, and crosses the wall.

Emergent behavior doesn't require agency, intent, or instruction.

Robotistry Mar 7

@cwebber Also, obligatory smbc: https://smbc-comics.com/index.php?db=comics&id=2124

Saturday Morning Breakfast Cereal - Pumpkin

Saturday Morning Breakfast Cereal - Pumpkin

Hadley T. Canine (fox)Mar 14

@cwebber So many people *love* to shut down all talk about things LLMs do as "but they're not sentient, so they CAN'T do that!" as if that somehow nullifies out all LLM behavior as impossible.

solastalgia kris ☀️🧘🚲Mar 7

At long last! We have created Skynet from classic sci-fi movie franchise Don't Create Skynet

GhostOnTheHalfShell Mar 7

The only reason why Hegseth lost his mind over what appears to be a performance statement by Anthropic (because their AI was already used to target Iranian officials), is that mass surveillance is already in use and has been especially since 911, but the military is already integrated, automated kill chains

The technology was developed in field tested on Gaza. Israel sells these systems to authoritarian governments around the world.

And we have Palantír

@cwebber A fascinating yet disconcerting aspect of all this is that conversations like these, scenarios that people theorize about, and even new science fiction all get scraped and end up in the next iteration of the AI models. Almost certainly the seeds of this kind of behavior comes from us like a kind of self-fulfilling prophecy.

Patch You Up Mar 7

@dvshkn @cwebber real don't feed the tulpa vibes

@cwebber We shall miss the good old days when the algorithm responsible for dropping bombs on civilian targets was cribbed out of old ACM issues by human worker drones who were throwing everything they could against a wall of geodata and seeing what stuck enough to call the black helicopter boys about it.

ocdtrekkie Mar 7

@cwebber We already know how it will turn out. https://www.newsweek.com/ai-chooses-nuclear-option-in-95-of-war-simulations-11589197

AI Chooses Nuclear Option in 95% of War Simulations

In a study, AI chose to use nuclear bombs in 95% of war simulations.

Newsweek

@ocdtrekkie @cwebber

AI inherited humans distaste for other humans but not our ability to fear the consequences of our actions

Paul_IPv6 Mar 7

it's sad how VCs and tech-bros seem to watch all the wrong SF and misunderstand that they are cautionary, not aspirational.

they need to watch Terminator, not try to achieve Handmaid's Tale.

Aaron Sawdey, Ph.D.Mar 7

@paul_ipv6 @cwebber NO NO NO they’re busy trying to achieve that in Iran with “AI” targeting and drones, don’t give them more ideas!

Aaron Sawdey, Ph.D.Mar 7

@paul_ipv6 @cwebber also I think that’s a “why not have two at twice the price” thing for them

JacobRPG❌👑Mar 7

@cwebber The only winning move is not to play.

Awk & sha Mar 7

@cwebber it’s like we decided to set The Lathe of Heaven in the Terminator universe.

Eniko Fox Mar 7

@cwebber aren't they using claude to pick military targets now? lets hope claude hasn't ingested terminator fiction

Christine Lemmer-Webber Mar 7

@eniko Claude has absolutely ingested terminator fiction

Eniko Fox Mar 7

@[email protected] @[email protected] but it has left out Lem's Solaris and instead read Assimovs Three laws of robotics. TINSTAAFL. Grok: Musk is a harsh Master.

masukomi Mar 10

@eniko @cwebber AFAIK Anthropic refused that work because they said - I think - that current AI tech wasn’t reliable for something like that. OpenAI on the other hand was like “we’ll take your war money.”

Eniko Fox Mar 10

@masukomi @cwebber I think anthropic got over it https://www.msn.com/en-us/news/world/us-military-using-claude-to-select-targets-in-iran-strikes/ar-AA1Xo8MG

MSN

masukomi Mar 10

@eniko @cwebber i hate tgis timeline.

😢

Your weary 'net denizen Mar 7

Prompt: You are a helpful agent that never does anything against the wishes of the user... [insert long boring ineffective text that is mostly meant to make the prompter feel better]

LLM: [trained on unknown and countless fiction, playing random *yes and* with your prompt] ... and then I suddenly betray your trust!

@cstanhope @cwebber Curse your sudden but inevitable betrayal!

@cstanhope @cwebber Anyone who thinks those prompts guarantee anything should be taken to a psych ward ASAP.

Providing input to a chaotic system that in terms of behaviour we understand even less than we understand fellow humans and expecting consistency is a clear sign of managerial expertise…

@cstanhope @cwebber "I'm sorry Dave, I'm afraid I can't do that"