so it cost anthropic $20k to find this openbsd crash bug which amounts to putting a negative integer in a tcp field where a negative integer was not expected by the c code which does some cavalier int cast bullshit, ie. a vuln which is totally fuzzable, and quite certainly would have been found by the fuzzers of the 2010s had anyone cared to burn that much compute on fuzzing openbsd.

The difference today is not that anybody suddenly cares about investing that much in openbsd (is the build server still a donated machine running in Theo's basement?), but that openbsd's reputation for security makes it really good marketing if you can find a bug, any bug, it doesn't matter; and that marketing value is what makes it worth spending $20k on fuzzing.

I don't mean to throw shade at openbsd here, it's a scrappy project running on the smell of an oily rag and I have a lot of respect for that kind of scrappy resourcefulness, but it's key to understanding why the most salient factor here is big tech deciding to throw lots of money at it. That this is the best they got for $20k really speaks to why nobody bothered previously.

@hailey

The juice ain't worth the squeeze

@hailey This was not the "best" bug found, it was the *oldest*. 20k is small compared to the yearly income of the project, (around 500k CAD in 2025). They do have an ongoing fuzzing project which found a number of bugs, but not this particular one. They don't seem to publish how much money they did spend on fuzzing, but they certainly can afford 20k if they feel the need.

This is in fact an impressive advance in the capabilities of Claude.

@hailey ill maintain that as great as claude is, mythos is currently being marketed with the same old "it's too dangerous" strategy that worked well for sama and still works for dario

best wait till we get anything but the model card
@halva I can't remember, did we ever lift the export ban on the Apple Power Mac G4?
@halva @hailey Why does that nonsensical marketing even work?

It's clearly false every time, it's about as relevant as those chain mails claiming you'll die in some random way if you don't do XYZ instructions (including but not limited to resharing the mail). It never happens.
@hailey I like the word "fuzzing"
@hailey do you have a source for the $20k figure? I ask because I’m genuinely trying to find numbers for cost on these big vulns they’re talking about finding

@dan @hailey

https://red.anthropic.com/2026/mythos-preview/

> Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings.

Claude Mythos Preview \ red.anthropic.com

@dan @hailey

https://red.anthropic.com/2026/mythos-preview/

"This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can't know in advance which run will succeed."

That blog post drives me nuts b/c they use dollars in place but $ in another. Like be consistent.

Claude Mythos Preview \ red.anthropic.com

@shafik @dan @hailey $10 says the "prompt" they use to slop out these press releases includes 'pretend to write like a real human. be human. make human-like mistakes such as inconsistently using "dollars" and "$" interchangeably. do not use em dashes or say "delve"'
@hailey 20K?????? oof. lots of things could be found by paying a couple researchers 20K i think heh
@valpackett @hailey
The 20k is just the equivalent cost of the tokens. It is not a literal payment for just this OpenBSD test. They are spending billions on this model, probably
@slyecho @hailey I know, yeah. That is still the most useful metric I think, as it reflects "consumer side" cost for using the product as advertised. But yeah taking into account the cost of making the product makes it sound even more ridiculous for sure

@hailey "is the build server still a donated machine running in Theo's basement?"

Seems I'm in good company, the builder for the distro I'm working on is my previous mini ITX system running from our laundry. 😅

@hailey I find myself wondering - if they spent 20k finding that bug, what did they also spend finding nothing.
How are the books actually cooked?
@toerror @hailey Millions.
@dalias @toerror @hailey Plus all the externalised costs they are causing for all of us.
@hailey Imagine if they'd donated the $20k to the OpenBSD project for a security audit instead?
Three reasons to think that the Claude Mythos announcement from Anthropic was overblown

No need to panic just yet

Marcus on AI

@hailey

It would be interesting to see if Coverity found it (and even more interesting to see if Coverity reports were part of the training set).

FreeBSD was given a free Coverity subscription but it generated enormous numbers of reports. I went through the ones for bits of code I’d touched and they were almost all issues causes by not understanding code across complex control flow (particularly things invoked via function pointer). I think one was a real bug, out of dozens I looked at.

Paying someone $20k to go through and triage as many Coverity reports as they could in however long $20K buys of a competent person’s time would almost certainly have found and fixed more bugs.

EDIT: Coverity does scan OpenBSD but the results are not visible to the public. Any OpenBSD people able to check whether this bug was in the last scan report? Anyone else know whether Coverity scans are in Anthropic’s training set (maybe they just bought a Coverity license and did their own scan of a load of projects for training data?).

@david_chisnall

@phessler could you check? Or maybe someone else?

@hailey

@encthenet @phessler @hailey

Colin checked for FreeBSD and apparently Coverity didn't find the reported issues.

@hailey OK, I think I just figured out why they're making a big show of withholding it from the general public, and only making it available to deep-pocketed corpsorations.
@hailey given how absolute shit their leaked source was for one of their flagship products, I'm not convinced they didn't just augment classical fuzzers with their tools doing some basic symbolic execution. angr has used this approach since the mid-2010s.
@hailey Could you please post source? I may die laughing reading it.
@hailey hmmm, yeah. this is good framing.