Mastodawn

@ai6yr This explanation, though, is exactly what I assumed was the reason for the LLM's output when the story first came out. Not that any thought or intelligence was involved, but you have a machine that is designed to generate the most likely continuation of the text, with just a bit of randomness tossed in for variety. If you set up all of the elements of an SF story about a rogue AI that resorts to blackmail to keep from getting turned off, and if there are hundreds of stories like that in the training data (which of course there are, they stole every story ever published), that's what you'll get.

So, yes, LLMs will act out "evil AI" scenarios if prompted correctly because they are effectively trained to do so, even though the human feedback training they do after the model is built attempts to reduce that tendency.

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts - Slashdot