So, wait, the whole “Mythos AI is so powerful it can find exploits in any software” thing requires both access to the source code and thousands of runs to find anything remotely actionable? This is the “too dangerous to release” model they’ve been hyping up?

Is that really it?

@baldur

Idk what 0-day exploits are going for these days, but from what I recall it could be north of a million USD depending on the scope and impact.

In comparison: spending 10k USD to find a 0-day RCE in a popular open source program seems like a bargain. I think it's less about the efficiency of the system and more about: "What are the odds an attacker with a credit card could make this your problem?"

@yosh @baldur The market price of a 0-day shouldn't be equated to the cost of finding a 0-day - the gap is the markup, and you can expect that to be astronomical for a "product" that's only getting sold a handful of times to extremely rich malefactors.

IOW, that price of an LLM-found 0-Day (which required expert human oversight anyway) might well be the same, or even greater, than just paying experts with a fuzzer and a decompiler.

And the humans boil fewer lakes

@baldur

Like, I'd really like to point people at this:

https://toot.yosh.is/@yosh/116376054778890780

Anyone saying stuff like "oh well a fuzzer would have found that" is wish casting. Sure, these things will find the obvious lowest hanging fruit first. But they can also find sandbox escapes in formally verified code in memory safe languages written by some of the best to ever do it, hooked up to fuzzers 24/7.

I don't like it either. But that doesn't mean it's real.

yosh (@[email protected])

Big new Wasmtime security release today - 11 new CVEs found including 2 critical ones using LLMs. https://bytecodealliance.org/articles/wasmtime-security-advisories If LLMs can find this many critical bugs in a project that is as rigorous about security as Wasmtime, then get ready for projects with weaker security postures to do a lot worse. Like,,, actually.

Mastodon

@yosh @baldur Quoting from https://bytecodealliance.org/articles/wasmtime-security-advisories
"However, there was no fuzzing to check that invalid strings are handled correctly, and each of these issues could have concievably been discovered if such a fuzzing harness had been written."
And further more:
"Upon updating the formal model to check against the latest Cranelift lowering rules, verification flags the same bug as was found with the LLM search."

This is not a slam dunk for LLMs over traditional methods.

Wasmtime's April 9, 2026 Security Advisories

A new world for security-critical projects

Bytecode Alliance

@tkissing @[email protected]

You're missing the point. It's not if/or. It's: "How much effort does it take for an attacker to point this at a program and find problems".

Wasmtime represents a best-case scenario where the maintainers have fuzzed the entire thing as much as they could, and even there it found problems. The maintainers are going to fix those problems and fill the gaps in fuzzing and that's good.

But most projects aren't even close to this, and yeah, I'm not optimistic about how that'll go.

@yosh Attackers could have used fuzzers to find some if not all of these. Might require a bit more expertise, but I'm not even sure about that. It seems the people who built the LLM tooling to find these issues are pretty much experts and spent considerable time and effort.
I'm not saying LLMs can't find anything exploitable, but I'm doubting that it's as easy as putting "find me a zero day in Chrome" into a prompt and be done.

@tkissing

I've been told it literally is that easy - that's run in a loop with some additional deduplication and reporting code tacked on. I wouldn't be worried if it wasn't.

Read the intro to the post again. "It's a new world" is not hyperbole by a bunch of AI boosters. This is what Team Fuzzer is saying after having been on the receiving end of these tools.

@yosh @tkissing "I've been told..."

I'm gonna stop you right there.

Just take the L.

@NosirrahSec @yosh "... at Microsoft". Yeah, I don't think Microslop employees have any credibility left, if they ever had any.
@viccie30 @yosh I am sure a great many there aren't shit people, but that number is probably dwindling as they suck down more "AI" loads.
@yosh Unfortunately the article doesn't give any hints how many false positives the tooling found and what level of expertise it took to craft the prompts, the additional tooling used by the LLM to verify findings etc.
Looking back at the hype we saw around Opus and other newer models and comparing that to what I have observed using those both for code generation and PR reviews, I find it unlikely that another model is suddenly *that* good.
@yosh I'm sure there will be some exploits popping up in the next months that were discovered using LLMs, but I am hopeful that most of them will have less than earth shattering impact.

@tkissing @yosh @baldur Is it a slam dunk in the sense of "traditional methods are dead and so is software security"?

Of course not.

But fuzzers are *also* probabilistic algorithms. LLMs add a lot more complexity to the potential analysis, are easier to operate for many, *and* currently made available far below true cost.

Of course this creates at least a temporary wave that is VERY real and not as easily achieved via traditional methods at this point in time.