Mastodawn

dominicq 20h ago

Small models also found the vulnerabilities that Mythos found

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

AI Cybersecurity After Mythos: The Jagged Frontier

Why the moat is the system, not the model

AISLE

Show thread

epistasis 20h ago

> We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens.

Impressive, and very valuable work, but isolating the relevant code changes the situation so much that I'm not sure it's much of the same use case.

Being able to dump an entire code base and have the model scan it is they type of situation where it opens up vulnerability scans to an entirely larger class of people.

Show thread

elicash 20h ago

This is from the first of the caveats that they list:

> Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints. The models' performance here is an upper bound on what they'd achieve in a fully autonomous scan. That said, a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do.

That's why their point is what the subheadline says, that the moat is the system, not the model.

Everybody so far here seems to be misunderstanding the point they are making.

Show thread

lelanthran

> That's why their point is what the subheadline says, that the moat is the system, not the model.

I'm skeptical; they provided a tiny piece of code and a hint to the possible problem, and their system found the bug using a small model.

That is hardly useful, is it? In order to get the same result , they had to know both where the bug is and what the bug is.

All these companies in the business of "reselling tokens, but with a markup" aren't going to last long. The only strategy is "get bought out and cash out before the bubble pops".