Vexed with the Guardian for this poor-quality article on so-called AI.
"AI models that lie and cheat"
No. Maybe you mean that they did something which a human didn't want them to do - but a Large Language Model has no ability to conceive of truth or lies. What they do is to extrude statistically-likely text.
"deceptive scheming"
No. LLMs cannot "scheme".
"destroying emails and other files without permission"
Well, obviously they _did_ have permission - in the software sense - or they couldn't have done it.
A statistical word-order model isn't designed to follow instructions reliably. If you want to be sure that it can't delete files, then don't hook it up with file-deletion access.
(Or make separate backups first.)
"The research uncovered hundreds of examples of scheming."
Again, LLM-bots are not "scheming". They're just extruding text, based on probabilities calculated from older text.
"use cyber-attack tactics to reach their goals without being told they could do so."
This shows only that similar text sequences were in their training data already. Cyber-attack text in, cyber-attack text out. If you don't want your bot to actually _cause_ an attack, then don't pipe its output to channels where its unpredictable extrusions could have that effect.
"In one case unearthed in the CLTR research, an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of âinsecurity, plain and simpleâ and trying âto protect his little fiefdomâ."
That part isn't even correct on its own terms! The blog seemingly by the Rathbun bot wasn't about "its human controller" - it was about a different person. (And hardly "unearthed" - that episode was slightly famous when it happened, and already much discussed.)
But also, "tried to shame" is projecting human motives onto a statistical model.
âThe worry is that theyâre slightly untrustworthy junior employees right now, but if in six to 12 months they become extremely capable senior employees scheming against you, itâs a different kind of concern.â
No. They're not "employees" and they're not "scheming". If humans fail to set appropriate technical limits on the scope of LLM-bot connections, that's the humans' fault.
And repeating anthropomorphic fantasies about them isn't helping! Fundamentally wrong framing. Pull your socks up, Guardian.
