"Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous #ExploitDevelopment But #MythosPreview is in a different league.

For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working #exploits 181 times, and achieved register control on 29 more."

https://red.anthropic.com/2026/mythos-preview/

Claude Mythos Preview \ red.anthropic.com