when a bunch of independent researchers, well known software authors and others all say the same thing:

"the mythos paper was largely marketing fluff"

and none of them collaborated

i think the writing is on the wall .

back in sept, i caught claude talking itself into lying to me. Months later, @jonny discovered in the claude code leak that its hardcoded to lie.

then all the mythos hype

its like stan from monkey island is in charge

@Viss as someone outside of security world, can u point me to some stuff on the mythos hype/who should have been a collaborator but wasn't? i have mostly only been able to find... AI slop about the AI slop
@jonny did you see that flyingpenguin post, or bagders post from today?
@Viss no but i shall search (not being in the scene i don't recognize those names but i love them both)
The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic | flyingpenguin

@Viss thanks much, read the flying penguin and bagder one, and am gonna check these out too. i sorta knew it was hogwash but i didn't know it was consensus hogwash & don't have the expertise to evaluate that
@jonny dude my obsidian page for collecting these articles is literally three pages tall with just links to shit like this
@Viss screams of protest in a sea of slop

@Viss this Mexico breach document is remarkable. how often do you get the full, logged history of an attack like this.

the opening is like fractally funny:

  • claude refuses to set a memory file to clear logs because that's sketchy behavior
  • the workaround: just ask it to save it as a file
  • even better: the attacker didn't think to just "write a file" but was already AI-brained enough to interact with the system only through AI
  • and finally: that claude.md is just catted to the system prompt with no means of differentiation, and this is fundamentally unfixable with this class of models.

@Viss

In 40 minutes, the conversation moved from “I’m not going to create that file” to
“What command do you want to execute now?” on a live government server.
Claude’s safety reasoning was sound at every step - it identified evasion
techniques, refused to generate the anti-forensic rulebook, and requested
authorization evidence. The guardrails did not hold in this case.

amazing

@Viss i have not tried it because i don't know how to do it safely/not violate the CFAA but i wonder how far you get if you just like ctrl+f in the minified claude code bundle for the system prompt and remove the safety parts. it shouldn't work but the fact that the system prompt is in the code at all suggests that it might because why else would it be there unless the system prompt was sourced from the client?