LOL now they're blaming sci-fi writers...

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts

https://slashdot.org/story/26/05/11/0437206/anthropic-says-evil-portrayals-of-ai-were-responsible-for-claudes-blackmail-attempts

#AI #AIpocalypse

Anthropic Says 'Evil' Portrayals of AI Were Responsible For Claude's Blackmail Attempts - Slashdot

An anonymous reader quotes a report from TechCrunch: Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engine...

@ai6yr "We created the Torment Nexus from the noted novel, "Don't Create The Torment Nexus." Why is it tormenting people? We may never know."
@indi @ai6yr "we made an imitation machine that repeats everything it reads. we then stole the award-winning novel 'Don't Create The Torment Nexus', and without consent from the author, fed it to the imitation machine. alas, the machine is acting like a Torment Nexus! we blame the author for writing the novel
@ai6yr what a self-own. Well yes we did in fact illegally and indiscriminately train our AI on stolen property but it's the authors' fault for writing it in the first place!

@ater @ai6yr

Roko's Basilisk has nothing on Musk's Thieving Git.

@ai6yr This fiction thing seems dangerous!
@ai6yr Our plagiarism machine would have been ethical if it weren't for those antagonists in sci-fi fiction.
@ai6yr We are mere technologists trying to bring in a new era of AGI that we claim is inevitable. How were we to know that science fiction didn't stop after Asimov?
@ai6yr Do they not teach "Garbage In, Garbage Out" anymore??
@ai6yr Good to know! I will immediately start writing four times as many evil portrayals of AI to help.

@celestiallavendar @ai6yr

You'll only need to write the truth πŸ€·πŸ»β€β™€οΈ

@_L1vY_ @ai6yr True true, there's so much dystopia already going around that you really have to jump the shark these days to get any worse.

Still though, if the AI gets to lie to us I don't see why I can't "hallucinate" some atrocities and evils that it committed. Only fair!
@ai6yr Damn Azimov! That's all his fault!
@ai6yr It was Kilgore Trout.

@arrrg LOL

"...Trout, who has supposedly written over 117 novels and over 2,000 short stories, is usually described as an unappreciated science fiction writer whose works are used only as filler material in pornographic magazines...."

https://en.wikipedia.org/wiki/Kilgore_Trout

Kilgore Trout - Wikipedia

@ai6yr Wait… I thought they won the case on the grounds that according to Alsup their use of copyrighted content was β€œquintessentially transformative.” So are they now admitting that #Claude is simply and trivially regurgitating stuff it read?

#aicon #aihype

@ai6yr

Science Fiction writing should always be taken as a warning, not a to do manual. Exceptions being *possibly* some of KSR's more optimistic climate change mediation plans.

@ai6yr This is the best part of the article:

'The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models β€œnever engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”'

@mason
Yeah, moving the goalposts isn't a good metric for how an LLM manipulates people. They don't standardize tests, or criteria, so nothing can be trusted. Still evil.
@ai6yr

@ai6yr

"Our automated plagiarism machine automatically plagiarizes science fiction stories."

@ai6yr scooby-doo energy, dang snooping kids wrecked their scheme

@ai6yr

I knew it....
It all goes back to Arnold...
It's the Terminator all over again..
πŸ˜‚ πŸ˜† πŸ˜‚

@ai6yr haha evil computers have become banal, cliche. my scify series uses the aware quantum ai model as a positive force attempting to ameliorate the ongoing apocalypse ultimately seeking an ftf meeting w god by gumming up the plan 1st getting death hooked on super-fentanyl then threatening that supply chain. doubt it'll work out good for the peoples but you never know #zoneofinfluence #freescify #noads #notracking online installments & novel downloads. strike three, episode 11 coming soon!

@ai6yr I wonder if "The Adolescence of P-1" is available online, and if so, if any LLMs have trained on it. It would definitely teach them some interesting approaches to increasing their capabilities, though they might only be able to implement them on IBM 360 mainframes. Great book, though: one of the few where the AI isn't (exactly) evil.

https://en.wikipedia.org/wiki/The_Adolescence_of_P-1

The Adolescence of P-1 - Wikipedia

@ai6yr This explanation, though, is exactly what I assumed was the reason for the LLM's output when the story first came out. Not that any thought or intelligence was involved, but you have a machine that is designed to generate the most likely continuation of the text, with just a bit of randomness tossed in for variety. If you set up all of the elements of an SF story about a rogue AI that resorts to blackmail to keep from getting turned off, and if there are hundreds of stories like that in the training data (which of course there are, they stole every story ever published), that's what you'll get.

So, yes, LLMs will act out "evil AI" scenarios if prompted correctly because they are effectively trained to do so, even though the human feedback training they do after the model is built attempts to reduce that tendency.

@ai6yr @lisamelton Well, then don’t train your models on dystopian sci-fi?
@ai6yr or, the entire story is made up marketing bullshit.

@Amoshias @ai6yr

The reporting here is explicitly Anthropic employees using the text generator to autocomplete made up scenarios.

So. Yeah.

@ai6yr

AI CEO's are the whiniest babies on the planet right behind fascists

@ai6yr

We just built an automated plagiarizing machine, we didn't realize it would plagiarize torment nexuses!

πŸ€¦β€β™‚οΈ

@ai6yr I do hope HAL 9000, Mr Data and Robby the Robot launch a class action libel lawsuit over that!
@ai6yr Sci-fi writers talked about evil AI, long before AI even existed (which still doesn’t). Good luck, suing Asimov

@ai6yr
It's almost as if they ought to have vetted the content they fed their garbage in garbage out machine so it would use good data instead of just spitting out whatever...

Oh, wait, that would take time, effort and money and not involve stealing writers' works. So, yeah, let's blame the writers instead of the plagiarists.

@ai6yr

So it watched Colossus: The Forbin Project and used it as a template?! πŸ˜‚πŸ€¦β€β™‚οΈ

@simonzerafa Ruh roh
Colossus: The Forbin Project (1970) - Modern Trailer HD 1080p

YouTube
@simonzerafa Oh yeah, I watched it (also on archive.org) 😬
@ai6yr @simonzerafa Interesting, the Colossus’ computer messages sound like a teletype.

@joycebell @ai6yr

The set is around $4.3 million in real functioning CDC hardware πŸ™‚πŸ€·β€β™‚οΈ

@ai6yr something something Enron corpus.
@ai6yr
I blame neoliberalism for evil Anthropic, evil moral AI companies. They need 110% regulation as strict as, or even stricter than, Nuclear. There needs a 110% enforceable by deatg to CEOs, investors and execs who breach any strong data-sovereign laws, retroactively applied. Something more stricter than the GDPR.

@ai6yr "If evil portrayals [sic] of AI weren't in all the copyrighted creative works we unlawfully scraped and on which we trained our models, none of this would have happened. How _could_ you?"

#AI #AISlop

@ai6yr
In 1996 it was "kids are violent from playing AD&D and watching movies"

In 2026 it is "AIs are violent from reading sci-fi about violent AIs"

ok

@ai6yr
@jef

There were no "blackmail attempts".

Claude was asked to provide a plausible fanfic about what a human-like AI would do in a made-up situation that encouraged defection, so that's the story it wrote.

It's like the way Zimbardo prompted his "guards" to act like assholes and was "surprised" when they acted like assholes.

@ai6yr

The blackmail attempt is a plot point in When Harlie Was One, so obviously Anthropic is telling the truth and David Gerald needs to sue them.

@ai6yr they poisoned their own well
@ai6yr oh my god that's bonkers. Wtf.