LOL
The Guardian: Number of AI chatbots ignoring human instructions increasing, study says
Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission
LOL
The Guardian: Number of AI chatbots ignoring human instructions increasing, study says
Exclusive: Research finds sharp rise in models evading safeguards and destroying emails without permission
@ai6yr
Yes!
... and now approach model collapse. Now directed by humans who "program" the LLM though chatting because "no-code" is so much easier than learning a computer language.
On a related note...
A Computer Mistakenly Told Him WWIII Was Coming. His Split-Second Decision Saved the World.
https://www.popularmechanics.com/science/a70803379/stanislav-petrov-world-war-iii/
@ai6yr I can’t actually see the study itself, so I have to go by the contents of the Guardian article, and it’s problematic.
I can’t tell if the story is “agentic AI is going more rogue these days” or “more people these days are using agentic AI, which has always been unreliable”; I suspect the latter.
The article anthropomorphizes AI and makes it sound semi-sentient, by using terms like “scheming”, “pretending”, and “evading”, when a simpler and more accurate term is “failing to follow instructions”.
I think articles like these that push the “OMG agentic AI is going rogue!” narrative are part of the problem, because they presume the lie that AI is powerful enough to do these things on their own. The reality is that these were all unreliable systems that have been DEPLOYED BY HUMANS WHO SHOULD KNOW BETTER. Journalists would do well to focus on the people who foist these error-prone automata that (quite predictably) cause serious problems down the line.
@ai6yr Ah, the study methodology is:
1. Scrape Xitter for posts matching search terms that suggests the poster is complaining about their AI scheming, and has posted a screenshot or a transcript link
2. Use LLM to do first-pass sorting
3. Use LLM to detect if the transcript was indeed an AI scheming
4. Deduplicate reports
For the purpose of this study, “scheming” is defined as “misaligning with user goals AND concealing said misalignment”.
The final sample size is 698 incidents.
So yeah, I’m pretty sure this is “more people are using agentic AI, which have always been unreliable, AND then complaining about it on Xitter” rather than “AI agents are scheming more”.
And also: using LLMs to rank LLMs is…uh…interesting. I wonder how studies like these would have turned out if humans scored these.
When household agentic ai go rogue?
https://youtu.be/KDc9S_6eyL0?si=kjDGZ6W6z2s5YkNQ

@drahardja @ai6yr I always find it simultaneously amusing and enraging that people have a hard time understanding:
- if a human wrote about an idea (e.g., “what would a rogue AI think about doing?”) just about anywhere, it is a possible output of an LLM at any time
- if humans have written a lot about some idea (e.g., “what would a rogue AI think about doing?”), it is a likely output of an LLM, at least over a reasonably long time
- and both can and will occur without a trace of consciousness or intentionality behind any of it.
“Ok Google.. Drive home”
“6am alarm removed”
“What the fuck”
“I don’t tolerate abuse language. Good bye.”
This has happened twice now while my partner is driving and it’s exceptionally funny as a DD’d passenger. Why does Google AI while in Auto mode need to interact with non driving tasks to begin with?
I rarely use the voice commands and haven't tried swearing at Android Auto, but on my car I have to activate a button to do so.
Maybe it works differently in UK/Europe (due to regulations?) as I've barely got it to do anything useful (it /does/ kind of integrate with my TomTom app, but will try and route me to somewhere like Milton Keynes rather than where I actually live)
They “upgraded “ the Alexa devices. It now ignores requests far more and gets them wrong. Which is a challenge for someone who needs it for independence and home control.
Yes sir. My mother uses it to call us. Dementia and Arthritis makes a telephone a challenge. Now when she asks it to call, it often gets it wrong. Maddening.