It's so cool that anthropic is setting up a double-sided protection racket where it will profit from the massive token burn of attackers and defenders with a tool specifically designed to generate exploits and their only observable mitigation is a clientside system prompt that sternly warns the LLM to be good and not do malware
https://red.anthropic.com/2026/mythos-preview/
Claude Mythos Preview \ red.anthropic.com

@jonny

It's not specifically designed to generate exploits; it's a general-purpose LLM that turns out to be very good at writing exploits.

@datarama they are more or less explicitly telegraphing that defense by attempting to generate exploits against yourself will be at least part of the product. their description in second image reads to me like a product brief - these are the things you should expect from the fully automated luxury self-hack solution

@datarama @jonny

I have come to the conclusion that making a thing as general-purpose as possible is a fool's quest because nearly all purposes are stupid.

All the well-defined and non-stupid goals can be pursued better with the right set of specific-purpose tools.