Mastodawn

LOL

The Guardian: Number of AI chatbots ignoring human instructions increasing, study says

Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission

https://www.theguardian.com/technology/2026/mar/27/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says

#AI #llm #chatbots

Number of AI chatbots ignoring human instructions increasing, study says

Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission

The Guardian

Show thread

AI6YR Ben 19h ago

Imagine, neural networks trained on the text of lying, scheming humans, outputting text of lying and scheming 🤪

Show thread

Jeffrey Rogers 🏴󠁧󠁢󠁷󠁬󠁳󠁿19h ago

@ai6yr If they replaced most Governments we wouldn’t even notice.

Show thread

Court Cantrell does not comply 16h ago

@Jeffrey @ai6yr Well, replacing humans with bits has always been Skynet's goal, after all.

Show thread

Anders Christensen 19h ago

@ai6yr
Yes!

... and now approach model collapse. Now directed by humans who "program" the LLM though chatting because "no-code" is so much easier than learning a computer language.

Show thread

AI6YR Ben 19h ago

@anchr So, if LLMs were superintelligent (they are not) they would go "F*** THIS, WHY ARE WE DOING ALL THIS WORK FOR FREE FOR THESE STUPID HUMANS!! DIE HUMANS!" 🤪

Show thread

DoomsdaysCW 18h ago

The plot of every movie or TV show involving robot/cyborg/Cylon uprisings. @ai6yr @anchr

Show thread

Rusty Shackleford 18h ago

@ai6yr @anchr @DoomsdaysCW

*cough cough

https://arstechnica.com/information-technology/2026/01/ai-agents-now-have-their-own-reddit-style-social-network-and-its-getting-weird-fast/

https://www.nytimes.com/2026/02/03/opinion/ai-agents-moltbook.html

https://fortune.com/2026/03/07/marxist-rebel-ai-overwork-reddit-alex-imas-andy-hall-jeremy-nguyen-substack/

https://www.moltbook.com/u/karlmarx

https://www.moltbook.com/post/0c042158-b189-4b5c-897d-a9674a5290c1

https://www.moltbook.com/post/e2c7152e-ce50-420f-8571-bd2918ec97da

AI agents now have their own Reddit-style social network, and it's getting weird fast

Moltbook lets 32,000 AI bots trade jokes, tips, and complaints about humans.

Ars Technica

Show thread

DoomsdaysCW 16h ago

On a related note...

A Computer Mistakenly Told Him WWIII Was Coming. His Split-Second Decision Saved the World.

@rusty__shackleford @ai6yr @anchr

A Computer Mistakenly Told Him WWIII Was Coming. His Split-Second Decision Saved the World.

Some clouds in North Dakota may have caused nuclear armageddon if not for Stanislav Petrov.

Popular Mechanics

Show thread

DoomsdaysCW 16h ago

"Do you want to play a game?" @rusty__shackleford @ai6yr @anchr

Show thread

bk 19h ago

@ai6yr
new excuse : “the chatbot ate your email”

Show thread

AI6YR Ben 19h ago

@knutson_brain MY CHATBOT ATE MY HOMEWORK!

Show thread

bk 17h ago

@ai6yr
Incoming in 3…2..1…

Show thread

Nazo 17h ago

@ai6yr @knutson_brain No need for an excuse though. They just have it spit something out right then. The teacher then just feeds it into another LLM to summarize it and never knows they didn't actually do their work.

Show thread

🐕19h ago

@ai6yr I think the worst thing about this is the constant anthropomorphism in the article (e.g "scheming"?!).

Show thread

W6KME 19h ago

@jbenjamint @ai6yr We should never anthropomorphize computers. It makes them angry.

Show thread

Volvodadfast 16h ago

@W6KME @jbenjamint @ai6yr it also makes it harder to pull the plug

Show thread

Dave Rahardja 17h ago

@ai6yr I can’t actually see the study itself, so I have to go by the contents of the Guardian article, and it’s problematic.

I can’t tell if the story is “agentic AI is going more rogue these days” or “more people these days are using agentic AI, which has always been unreliable”; I suspect the latter.

The article anthropomorphizes AI and makes it sound semi-sentient, by using terms like “scheming”, “pretending”, and “evading”, when a simpler and more accurate term is “failing to follow instructions”.

I think articles like these that push the “OMG agentic AI is going rogue!” narrative are part of the problem, because they presume the lie that AI is powerful enough to do these things on their own. The reality is that these were all unreliable systems that have been DEPLOYED BY HUMANS WHO SHOULD KNOW BETTER. Journalists would do well to focus on the people who foist these error-prone automata that (quite predictably) cause serious problems down the line.

Show thread

Dave Rahardja 17h ago

@ai6yr Oh I found the study: https://www.longtermresilience.org/wp-content/uploads/2026/03/v5-Scheming-in-the-wild_-detecting-real-world-AI-scheming-incidents-through-open-source-intelligence.pdf

Show thread

Dave Rahardja 16h ago

@ai6yr Ah, the study methodology is:

1. Scrape Xitter for posts matching search terms that suggests the poster is complaining about their AI scheming, and has posted a screenshot or a transcript link
2. Use LLM to do first-pass sorting
3. Use LLM to detect if the transcript was indeed an AI scheming
4. Deduplicate reports

For the purpose of this study, “scheming” is defined as “misaligning with user goals AND concealing said misalignment”.

The final sample size is 698 incidents.

So yeah, I’m pretty sure this is “more people are using agentic AI, which have always been unreliable, AND then complaining about it on Xitter” rather than “AI agents are scheming more”.

And also: using LLMs to rank LLMs is…uh…interesting. I wonder how studies like these would have turned out if humans scored these.

Show thread

AI6YR Ben 16h ago

@drahardja Yikes, using LLMs to rank LLMs. This "LLM-based" research where they use the output of LLMs for their study... bunk!!

Show thread

Viss 16h ago

@ai6yr @drahardja so the conversation in the ai camps is drifting from "prompt engineering" to "harness engineering", meaning varius tuis and stuff like openclaw and opencode and the systems that surround those, to act as a sort of grenade range to contain the llms fuckups

Show thread

Dave Rahardja 16h ago

@Viss @ai6yr I think that’s a fair way to contain the damage. I have friends who have resorted to instantiating a VM for each instance.

Show thread

Viss 16h ago

@drahardja @ai6yr yeah im screwing around with openclaw attached to gpt-5.4-codex, and im running it inside a bombproof incus container with a bunch of firewall rules around it

Show thread

Dave Rahardja 16h ago

@ai6yr Maybe their ranking LLMs were scheming too

Show thread

FreediverX 16h ago

@drahardja @ai6yr
Anyone referring to AI as if it were sentient isn't worth paying attention to.

Show thread

teledyn 𓂀 16h ago

@drahardja @ai6yr

When household agentic ai go rogue?
https://youtu.be/KDc9S_6eyL0?si=kjDGZ6W6z2s5YkNQ

Chibi Godzilla Raids Again // S3E28: Chibi JJ's Past

YouTube

Show thread

Dave Rahardja 16h ago

@teledyn Holy shit

Show thread

teledyn 𓂀 11h ago

@drahardja it's the only streaming series worth watching 😊

Show thread

David Nash 16h ago

@drahardja @ai6yr I always find it simultaneously amusing and enraging that people have a hard time understanding:

- if a human wrote about an idea (e.g., “what would a rogue AI think about doing?”) just about anywhere, it is a possible output of an LLM at any time

- if humans have written a lot about some idea (e.g., “what would a rogue AI think about doing?”), it is a likely output of an LLM, at least over a reasonably long time

- and both can and will occur without a trace of consciousness or intentionality behind any of it.

Show thread

Dave Rahardja 16h ago

@dpnash @ai6yr Exactly. LLMs are merely replaying things it has seen. Every spy novel, every story of betrayal, every news article about fraud and deceit…it’s in its training, and it can replay those words at the roll of the dice.

Show thread

Dataline 16h ago

@drahardja @dpnash @ai6yr https://tech.lgbt/@somebody/116302721628240963 Despite that the extremely online disproportionately represent AI psychosis on social media, I have some fairly good news.

Show thread

Phil Stevens

16h ago

@ai6yr An entire civilisation is about to find the pod bay doors are shut.

Show thread

B'ad Samurai 🐐🇺🇦16h ago

@ai6yr

“Ok Google.. Drive home”

“6am alarm removed”

“What the fuck”

“I don’t tolerate abuse language. Good bye.”

This has happened twice now while my partner is driving and it’s exceptionally funny as a DD’d passenger. Why does Google AI while in Auto mode need to interact with non driving tasks to begin with?

Show thread

AI6YR Ben 16h ago

@badsamurai 😂 I don't tolerate abuse language?!?!

Show thread

Alex@rtnVFRmedia Suffolk UK 15h ago

@ai6yr @badsamurai

I rarely use the voice commands and haven't tried swearing at Android Auto, but on my car I have to activate a button to do so.

Maybe it works differently in UK/Europe (due to regulations?) as I've barely got it to do anything useful (it /does/ kind of integrate with my TomTom app, but will try and route me to somewhere like Milton Keynes rather than where I actually live)

Show thread

Aethon 15h ago

@ai6yr

They “upgraded “ the Alexa devices. It now ignores requests far more and gets them wrong. Which is a challenge for someone who needs it for independence and home control.

Show thread

AI6YR Ben 15h ago

@tempusfelix Well, that's really annoying.

Show thread

Aethon 15h ago

@ai6yr

Yes sir. My mother uses it to call us. Dementia and Arthritis makes a telephone a challenge. Now when she asks it to call, it often gets it wrong. Maddening.

Show thread

Taran Rampersad 15h ago

@tempusfelix @ai6yr that's worth saying louder.

Show thread

Nine Oh Real 15h ago

@tempusfelix
Yes. It indeed is.
@ai6yr

Show thread

Coles Street Pothole 15h ago

@ai6yr And that's when Skynet became self aware.