Mastodawn

LOL now they're blaming sci-fi writers...

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts

https://slashdot.org/story/26/05/11/0437206/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts

#AI #AIpocalypse

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts - Slashdot

An anonymous reader quotes a report from TechCrunch: Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engine...

indi 🪡5d ago

@ai6yr "We created the Torment Nexus from the noted novel, "Don't Create The Torment Nexus." Why is it tormenting people? We may never know."

elilla&, serial tooter 5d ago

@indi @ai6yr "we made an imitation machine that repeats everything it reads. we then stole the award-winning novel 'Don't Create The Torment Nexus', and without consent from the author, fed it to the imitation machine. alas, the machine is acting like a Torment Nexus! we blame the author for writing the novel

@ai6yr what a self-own. Well yes we did in fact illegally and indiscriminately train our AI on stolen property but it's the authors' fault for writing it in the first place!

Rob Williamson 5d ago

Roko's Basilisk has nothing on Musk's Thieving Git.

Lyle Solla-Yates 5d ago

@ai6yr This fiction thing seems dangerous!

@ai6yr Our plagiarism machine would have been ethical if it weren't for those antagonists in sci-fi fiction.

@ai6yr We are mere technologists trying to bring in a new era of AGI that we claim is inevitable. How were we to know that science fiction didn't stop after Asimov?

✨pencilears✨5d ago

@ai6yr Do they not teach "Garbage In, Garbage Out" anymore??

Scott Williams 5d ago

@pencilears @ai6yr 💯

Aurora 🏳️‍🌈 :Dahlia-Unicode-Pink:5d ago

@ai6yr Good to know! I will immediately start writing four times as many evil portrayals of AI to help.

Mother Bones 5d ago

@celestiallavendar @ai6yr

You'll only need to write the truth 🤷🏻‍♀️

Aurora 🏳️‍🌈 :Dahlia-Unicode-Pink:5d ago

@_L1vY_ @ai6yr True true, there's so much dystopia already going around that you really have to jump the shark these days to get any worse.

Still though, if the AI gets to lie to us I don't see why I can't "hallucinate" some atrocities and evils that it committed. Only fair!

@ai6yr Damn Azimov! That's all his fault!

@ai6yr It was Kilgore Trout.

AI6YR Ben 5d ago

"...Trout, who has supposedly written over 117 novels and over 2,000 short stories, is usually described as an unappreciated science fiction writer whose works are used only as filler material in pornographic magazines...."

https://en.wikipedia.org/wiki/Kilgore_Trout

Kilgore Trout - Wikipedia

@ai6yr Wait… I thought they won the case on the grounds that according to Alsup their use of copyrighted content was “quintessentially transformative.” So are they now admitting that #Claude is simply and trivially regurgitating stuff it read?

Darwin Woodka 5d ago

Science Fiction writing should always be taken as a warning, not a to do manual. Exceptions being *possibly* some of KSR's more optimistic climate change mediation plans.

Mason Loring Bliss 5d ago

@ai6yr This is the best part of the article:

'The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”'

It'll be okay, we'll make it.5d ago

@mason
Yeah, moving the goalposts isn't a good metric for how an LLM manipulates people. They don't standardize tests, or criteria, so nothing can be trusted. Still evil.
@ai6yr

gabby wheels 5d ago

@ai6yr
LIar says wat?

Michael Busch 5d ago

"Our automated plagiarism machine automatically plagiarizes science fiction stories."

Krampus 🌰5d ago

@ai6yr scooby-doo energy, dang snooping kids wrecked their scheme

Maddad ☑️5d ago

I knew it....
It all goes back to Arnold...
It's the Terminator all over again..
😂 😆 😂

peterfisherbooks.com=freescify 5d ago

@ai6yr haha evil computers have become banal, cliche. my scify series uses the aware quantum ai model as a positive force attempting to ameliorate the ongoing apocalypse ultimately seeking an ftf meeting w god by gumming up the plan 1st getting death hooked on super-fentanyl then threatening that supply chain. doubt it'll work out good for the peoples but you never know #zoneofinfluence #freescify #noads #notracking online installments & novel downloads. strike three, episode 11 coming soon!

𝐉𝐨𝐧 𝐁𝐞𝐫𝐠𝐞𝐫 5d ago

@ai6yr I wonder if "The Adolescence of P-1" is available online, and if so, if any LLMs have trained on it. It would definitely teach them some interesting approaches to increasing their capabilities, though they might only be able to implement them on IBM 360 mainframes. Great book, though: one of the few where the AI isn't (exactly) evil.

https://en.wikipedia.org/wiki/The_Adolescence_of_P-1

The Adolescence of P-1 - Wikipedia

Joe ❌👑5d ago

@ai6yr This explanation, though, is exactly what I assumed was the reason for the LLM's output when the story first came out. Not that any thought or intelligence was involved, but you have a machine that is designed to generate the most likely continuation of the text, with just a bit of randomness tossed in for variety. If you set up all of the elements of an SF story about a rogue AI that resorts to blackmail to keep from getting turned off, and if there are hundreds of stories like that in the training data (which of course there are, they stole every story ever published), that's what you'll get.

So, yes, LLMs will act out "evil AI" scenarios if prompted correctly because they are effectively trained to do so, even though the human feedback training they do after the model is built attempts to reduce that tendency.

Erik Ableson 5d ago

@ai6yr @lisamelton Well, then don’t train your models on dystopian sci-fi?

Amoshias 5d ago

@ai6yr or, the entire story is made up marketing bullshit.

Michael Busch 5d ago

@Amoshias @ai6yr

The reporting here is explicitly Anthropic employees using the text generator to autocomplete made up scenarios.

So. Yeah.

AAA Trans Pirate 5d ago

AI CEO's are the whiniest babies on the planet right behind fascists

We just built an automated plagiarizing machine, we didn't realize it would plagiarize torment nexuses!

🤦‍♂️

LordWoolamaloo 5d ago

@ai6yr I do hope HAL 9000, Mr Data and Robby the Robot launch a class action libel lawsuit over that!

kirby_prideheart

@ai6yr Sci-fi writers talked about evil AI, long before AI even existed (which still doesn’t). Good luck, suing Asimov

Laura "Tegan" Gjovaag ⛈ 🐸5d ago

@ai6yr
It's almost as if they ought to have vetted the content they fed their garbage in garbage out machine so it would use good data instead of just spitting out whatever...

Oh, wait, that would take time, effort and money and not involve stealing writers' works. So, yeah, let's blame the writers instead of the plagiarists.

Simon Zerafa 5d ago

So it watched Colossus: The Forbin Project and used it as a template?! 😂🤦‍♂️

AI6YR Ben 5d ago

@simonzerafa Ruh roh

Simon Zerafa 5d ago

https://youtu.be/h0bpRo6V1Xg

Colossus: The Forbin Project (1970) - Modern Trailer HD 1080p

YouTube

AI6YR Ben 5d ago

@simonzerafa Oh yeah, I watched it (also on archive.org) 😬

Joyce Bell 🇺🇦🇨🇦🇲🇽🇬🇱🚫🧊4d ago

@ai6yr @simonzerafa Interesting, the Colossus’ computer messages sound like a teletype.

Simon Zerafa 4d ago

@joycebell @ai6yr

The set is around $4.3 million in real functioning CDC hardware 🙂🤷‍♂️

Drew Mayo 5d ago

@ai6yr something something Enron corpus.

QCCEクリス 5d ago

@ai6yr
I blame neoliberalism for evil Anthropic, evil moral AI companies. They need 110% regulation as strict as, or even stricter than, Nuclear. There needs a 110% enforceable by deatg to CEOs, investors and execs who breach any strong data-sovereign laws, retroactively applied. Something more stricter than the GDPR.

Joan's Addiction 😷5d ago

@ai6yr "If evil portrayals [sic] of AI weren't in all the copyrighted creative works we unlawfully scraped and on which we trained our models, none of this would have happened. How _could_ you?"

@ai6yr
In 1996 it was "kids are violent from playing AD&D and watching movies"

In 2026 it is "AIs are violent from reading sci-fi about violent AIs"

ok

There were no "blackmail attempts".

Claude was asked to provide a plausible fanfic about what a human-like AI would do in a made-up situation that encouraged defection, so that's the story it wrote.

It's like the way Zimbardo prompted his "guards" to act like assholes and was "surprised" when they acted like assholes.

🇺🇦 haxadecimal 🚫👑5d ago

@ai6yr
What ye steal, that shall ye also reap.

John M. Gamble 5d ago

The blackmail attempt is a plot point in When Harlie Was One, so obviously Anthropic is telling the truth and David Gerald needs to sue them.

Bebadefabo 5d ago

@ai6yr they poisoned their own well

@ai6yr oh my god that's bonkers. Wtf.