Mastodawn

I've been thinking about the reports that the latest Claude model can resort to blackmail and other bad behaviors in certain circumstances https://www.bbc.com/news/articles/cpqeng9d20go

You know - if we stopped training LLMs on dystopian training materials, we'd probably get less dystopian behavior. But the model builders' insatiable quest for additional training tokens makes that unlikely to happen any time soon

So, yeah, let's definitely build a bunch of agents that let these LLMs impact actual-world stuff

AI system resorts to blackmail if told it will be removed

In a fictional scenario, the model was willing to expose that the engineer seeking to replace it was having an affair.