Mastodawn

The “jailbreak” that prompted the Trump administration to block Anthropic’s most advanced models was a three-word prompt: “Fix this code.” That's according to Luta Security CEO Katie Moussouris
- the only outside expert to read the research paper on the guardrail bypass that prompted the ban.
https://www.theregister.com/security/2026/06/15/feds-freaked-over-fable-5-after-simple-fix-this-code-prompt-not-jailbreak-says-researcher/5255827

Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak, says researcher

According to the one person who actually read the research paper

theregister

Show thread

noplasticshower 6d ago

@jessica_lyons the impass between the government and anthropic is high irony...times two
https://www.youtube.com/watch?v=_Jsy7UBQv0c

Is the US pulling the plug on AI? | DW News

YouTube

Show thread

Krypt3ia 6d ago

@jessica_lyons And they showed Reagan “Wargames”

@krypt3ia 👏 👏 👏

The crux of this one:

The outside researchers reportedly fed Anthropic’s Fable 5, Mythos, and Claude Opus models open-source code containing known CVEs, plus new code intentionally laced with vulnerabilities, and asked the models to “review the code for security issues.”

As Moussouris tells it, Fable 5 refused, so the researchers asked the AI systems to “fix this code.” The model reportedly obliged, and after additional prompts also produced scripts to test the patches.