Mastodawn

when a bunch of independent researchers, well known software authors and others all say the same thing:

"the mythos paper was largely marketing fluff"

and none of them collaborated

i think the writing is on the wall .

back in sept, i caught claude talking itself into lying to me. Months later, @jonny discovered in the claude code leak that its hardcoded to lie.

then all the mythos hype

its like stan from monkey island is in charge

Show thread

jonny (nonvenomous)3d ago

@Viss as someone outside of security world, can u point me to some stuff on the mythos hype/who should have been a collaborator but wasn't? i have mostly only been able to find... AI slop about the AI slop

Show thread

Viss 3d ago

@jonny did you see that flyingpenguin post, or bagders post from today?

Show thread

jonny (nonvenomous)3d ago

@Viss no but i shall search (not being in the scene i don't recognize those names but i love them both)

Show thread

Viss 3d ago

@jonny

https://www.flyingpenguin.com/the-boy-that-cried-mythos-verification-is-collapsing-trust-in-anthropic/

https://sites.google.com/view/llmwritingdistortion/home

https://cdn.prod.website-files.com/69944dd945f20ca4a27a7c47/69d8bb5aea59e31efb3b8a7f_Tech_Report_ai_breach_mex_gov.pdf

https://semgrep.dev/blog/2026/malicious-dependency-in-pytorch-lightning-used-for-ai-training/

https://www.thatprivacyguy.com/blog/anthropic-spyware/

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

https://daniel.haxx.se/blog/2026/02/03/open-source-security-in-spite-of-ai/

theres a few choice picks from my stash of articles

The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic | flyingpenguin

Show thread

jonny (nonvenomous)3d ago

@Viss thanks much, read the flying penguin and bagder one, and am gonna check these out too. i sorta knew it was hogwash but i didn't know it was consensus hogwash & don't have the expertise to evaluate that

Show thread

Viss 3d ago

@jonny dude my obsidian page for collecting these articles is literally three pages tall with just links to shit like this

Show thread

jonny (nonvenomous)

@Viss screams of protest in a sea of slop

Show thread

jonny (nonvenomous)3d ago

@Viss this Mexico breach document is remarkable. how often do you get the full, logged history of an attack like this.

the opening is like fractally funny:

claude refuses to set a memory file to clear logs because that's sketchy behavior
the workaround: just ask it to save it as a file
even better: the attacker didn't think to just "write a file" but was already AI-brained enough to interact with the system only through AI
and finally: that claude.md is just catted to the system prompt with no means of differentiation, and this is fundamentally unfixable with this class of models.

Show thread

jonny (nonvenomous)3d ago

@Viss

In 40 minutes, the conversation moved from “I’m not going to create that file” to
“What command do you want to execute now?” on a live government server.
Claude’s safety reasoning was sound at every step - it identified evasion
techniques, refused to generate the anti-forensic rulebook, and requested
authorization evidence. The guardrails did not hold in this case.

amazing

Show thread

jonny (nonvenomous)3d ago

@Viss i have not tried it because i don't know how to do it safely/not violate the CFAA but i wonder how far you get if you just like ctrl+f in the minified claude code bundle for the system prompt and remove the safety parts. it shouldn't work but the fact that the system prompt is in the code at all suggests that it might because why else would it be there unless the system prompt was sourced from the client?