Mastodawn

LOL

The Guardian: Number of AI chatbots ignoring human instructions increasing, study says

Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission

https://www.theguardian.com/technology/2026/mar/27/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

#AI #llm #chatbots

Number of AI chatbots ignoring human instructions increasing, study says

Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission

The Guardian

Show thread

Dave Rahardja 18h ago

@ai6yr I can’t actually see the study itself, so I have to go by the contents of the Guardian article, and it’s problematic.

I can’t tell if the story is “agentic AI is going more rogue these days” or “more people these days are using agentic AI, which has always been unreliable”; I suspect the latter.

The article anthropomorphizes AI and makes it sound semi-sentient, by using terms like “scheming”, “pretending”, and “evading”, when a simpler and more accurate term is “failing to follow instructions”.

I think articles like these that push the “OMG agentic AI is going rogue!” narrative are part of the problem, because they presume the lie that AI is powerful enough to do these things on their own. The reality is that these were all unreliable systems that have been DEPLOYED BY HUMANS WHO SHOULD KNOW BETTER. Journalists would do well to focus on the people who foist these error-prone automata that (quite predictably) cause serious problems down the line.

Show thread

Dave Rahardja 18h ago

@ai6yr Oh I found the study: https://www.longtermresilience.org/wp-content/uploads/2026/03/v5-Scheming-in-the-wild_-detecting-real-world-AI-scheming-incidents-through-open-source-intelligence.pdf

Show thread

Dave Rahardja

@ai6yr Ah, the study methodology is:

1. Scrape Xitter for posts matching search terms that suggests the poster is complaining about their AI scheming, and has posted a screenshot or a transcript link
2. Use LLM to do first-pass sorting
3. Use LLM to detect if the transcript was indeed an AI scheming
4. Deduplicate reports

For the purpose of this study, “scheming” is defined as “misaligning with user goals AND concealing said misalignment”.

The final sample size is 698 incidents.

So yeah, I’m pretty sure this is “more people are using agentic AI, which have always been unreliable, AND then complaining about it on Xitter” rather than “AI agents are scheming more”.

And also: using LLMs to rank LLMs is…uh…interesting. I wonder how studies like these would have turned out if humans scored these.

Show thread

AI6YR Ben 18h ago

@drahardja Yikes, using LLMs to rank LLMs. This "LLM-based" research where they use the output of LLMs for their study... bunk!!

Show thread

Viss 18h ago

@ai6yr @drahardja so the conversation in the ai camps is drifting from "prompt engineering" to "harness engineering", meaning varius tuis and stuff like openclaw and opencode and the systems that surround those, to act as a sort of grenade range to contain the llms fuckups

Show thread

Dave Rahardja 18h ago

@Viss @ai6yr I think that’s a fair way to contain the damage. I have friends who have resorted to instantiating a VM for each instance.

Show thread

Viss 18h ago

@drahardja @ai6yr yeah im screwing around with openclaw attached to gpt-5.4-codex, and im running it inside a bombproof incus container with a bunch of firewall rules around it

Show thread

Dave Rahardja 18h ago

@ai6yr Maybe their ranking LLMs were scheming too