Mastodawn

Vexed with the Guardian for this poor-quality article on so-called AI.

"AI models that lie and cheat"

No. Maybe you mean that they did something which a human didn't want them to do - but a Large Language Model has no ability to conceive of truth or lies. What they do is to extrude statistically-likely text.

"deceptive scheming"

No. LLMs cannot "scheme".

"destroying emails and other files without permission"

Well, obviously they _did_ have permission - in the software sense - or they couldn't have done it.

A statistical word-order model isn't designed to follow instructions reliably. If you want to be sure that it can't delete files, then don't hook it up with file-deletion access.

(Or make separate backups first.)

"The research uncovered hundreds of examples of scheming."

Again, LLM-bots are not "scheming". They're just extruding text, based on probabilities calculated from older text.

"use cyber-attack tactics to reach their goals without being told they could do so."

This shows only that similar text sequences were in their training data already. Cyber-attack text in, cyber-attack text out. If you don't want your bot to actually _cause_ an attack, then don't pipe its output to channels where its unpredictable extrusions could have that effect.

"In one case unearthed in the CLTR research, an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of “insecurity, plain and simple” and trying “to protect his little fiefdom”."

That part isn't even correct on its own terms! The blog seemingly by the Rathbun bot wasn't about "its human controller" - it was about a different person. (And hardly "unearthed" - that episode was slightly famous when it happened, and already much discussed.)

But also, "tried to shame" is projecting human motives onto a statistical model.

“The worry is that they’re slightly untrustworthy junior employees right now, but if in six to 12 months they become extremely capable senior employees scheming against you, it’s a different kind of concern.”

No. They're not "employees" and they're not "scheming". If humans fail to set appropriate technical limits on the scope of LLM-bot connections, that's the humans' fault.

And repeating anthropomorphic fantasies about them isn't helping! Fundamentally wrong framing. Pull your socks up, Guardian.

https://www.theguardian.com/technology/2026/mar/27/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

#SoCalledAI #journalism

Number of AI chatbots ignoring human instructions increasing, study says

Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission

The Guardian

Show thread

Jennifer Moore 😷5h ago

Very interesting post about breaches and deletions involving LLM "agents". I feel like if I'd read this before yesterday's post, I'd have put the warning elements more strongly.

Here's two of the examples they mention which I thought were particularly illuminating.

1. This exploit actually happened the other day, affecting a Python package called LiteLLM:

"The malware searches the entire machine for private keys, AWS / GCP / Azure credentials, Kubernetes configs, database passwords, .gitconfig, crypto wallet files, etc and uploads them to the attacker’s server."

2. This second exploit is possible in principle if you give an LLM-bot access to your email program. "Although not seen in the wild yet, the mechanism is proven."

"An adversarial prompt embedded in an email is processed by an AI email assistant. The assistant generates a reply containing the same malicious prompt. The reply is sent. Recipients are infected without any human-to-human interaction."

If I understand correctly, this means that _any_ use of so-called "AI agents" puts at risk (for deletion, and potentially for stealing) everything to which that "agent" has access.

The thing is, you might _think_ you've told the bot what not to touch and what not to do, but that effectively means nothing. Once it's set going,

(a) it might accidentally _lose_ part of your original instruction (as in one of the other examples), or

(b) a malicious exploit might give it a _different_ instruction.

The only way to protect valuable data is to keep it separate from LLM "agents".

The writer's conclusion, which sounds correct to me:

"Isolation has to live outside of the agent’s context entirely. A built-in sandbox can be disabled by the agent (as Snowflake and Ona both demonstrated), whereas an OS-level containment presents a much more formidable obstacle since the agent has no direct mechanism to interact with it. As well, a properly sandboxed agent won’t have sensitive information (keys, etc) lying around for it to find, and won’t be able to connect to places that haven’t been allow-listed."

("Sandbox" in this context means an area where you can run software without it touching anything outside its boundaries.)

I think if I were gonna try this stuff out, I'd probably just do it on a separate machine, away from my real things. Any useful results could be transferred across later.

https://yoloai.dev/posts/ai-agent-threat-landscape/

#LLMs #SoCalledAI #AIAgents #security

Why your AI agents will turn against you

Black hats haven't quite figured out AI agents yet. When they do, it won't be subtle.

yoloAI

Show thread

Paul Healey

@unchartedworlds From the 1983 ‘War Games’ movie to war tech like drones and agents of Palantir and Claude etc. threatening a real nuclear war, authoritarian surveillance, troll farms, sex scams, sex abuse of images by Grok, too much energy consumption and water depletion by data farms, industrial pollution, financial scams and public theft, accelerating inequality via market bubble and taking jobs. What have LLM agents if not Artificially Intelligent but just designed to control and automate information processing been good for? #Ai #LLM #AlgorithmsMoralValues