LOL now they're blaming sci-fi writers...

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts

https://slashdot.org/story/26/05/11/0437206/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts

#AI #AIpocalypse

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts - Slashdot

An anonymous reader quotes a report from TechCrunch: Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engine...

@ai6yr This explanation, though, is exactly what I assumed was the reason for the LLM's output when the story first came out. Not that any thought or intelligence was involved, but you have a machine that is designed to generate the most likely continuation of the text, with just a bit of randomness tossed in for variety. If you set up all of the elements of an SF story about a rogue AI that resorts to blackmail to keep from getting turned off, and if there are hundreds of stories like that in the training data (which of course there are, they stole every story ever published), that's what you'll get.

So, yes, LLMs will act out "evil AI" scenarios if prompted correctly because they are effectively trained to do so, even though the human feedback training they do after the model is built attempts to reduce that tendency.