System Card: Claude Mythos Preview [pdf]
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
System Card: Claude Mythos Preview [pdf]
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf
Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
SWE-bench Verified: 93.9% / 80.8% / — / 80.6%
SWE-bench Pro: 77.8% / 53.4% / 57.7% / 54.2%
SWE-bench Multilingual: 87.3% / 77.8% / — / —
SWE-bench Multimodal: 59.0% / 27.1% / — / —
Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5% GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3%
MMMLU: 92.7% / 91.1% / — / 92.6–93.6%
USAMO: 97.6% / 42.3% / 95.2% / 74.4%
GraphWalks BFS 256K–1M: 80.0% / 38.7% / 21.4% / —
HLE (no tools): 56.8% / 40.0% / 39.8% / 44.4%
HLE (with tools): 64.7% / 53.1% / 52.1% / 51.4%
CharXiv (no tools): 86.1% / 61.5% / — / —
CharXiv (with tools): 93.2% / 78.9% / — / —
OSWorld: 79.6% / 72.7% / 75.0% / —
A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.
I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.
They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.
More than killer AI I'm afraid of Anthropic/OpenAI going into full rent-seeking mode so that everyone working in tech is forced to fork out loads of money just to stay competitive on the market. These companies can also choose to give exclusive access to hand picked individuals and cut everyone else off and there would be nothing to stop them.
This is already happening to some degree, GPT 5.3 Codex's security capabilities were given exclusively to those who were approved for a "Trusted Access" programme.