@Viss this simulation is flawed by the fact that they prompted the "AI" that it should pick between blackmailing or letting itself be shutdown. They gave it no room for attempting non-hostile solutions.

Anthropic keeps making these headline grabbing sham "studies"

@Kiloku @Viss i thought something similar.

Anyhow I am surprised that all jump on the Bad-AI-is-gonna-kill-us bandwagon instead of realizing it is reality disturbing Omni Consumer Products from RoboCop that we are witnessing

@ppxl @Kiloku i figure the question becomes real simple:

even if you think anthropic is goosing the tests, shouldnt the llm ... not do blackmail? i mean even if you told it to? that seems like the obvious expectation here.

@Viss @ppxl It's not an intelligent being, despite the name. It does whatever matches the content in its training data + the prompts it is given. There's no "should", it's a result of statistical calculations on frequency of chunks of text. Nothing about LLMs is obvious or expected, they are unpredictable.
Just like they often output wrong information about factual topics, it's not surprising that they do "wrong" behaviors in simulations.
@Kiloku @ppxl have you trained an llm before?

@Viss @Kiloku yeah biases AND implementation details kick AI responses. Underlying racism, insufficient and flat false training data etc.

Ugh that reminds me that I wrote my own Markov AI from scratch and trained it on my (back then) tweets. Results were catastrophic 😅

@ppxl @Kiloku heh, i did that ebooks bot thing too. was pretty hilarious :D
@Viss @Kiloku at times true... but the training data was really insufficient. And my implementation was flawed (naturally, to prove my point). I just stumbled over the code base recently and thought of a Golang re-implementation with some buff-up