Adam Shostack  

4.3K Followers
682 Following
11.6K Posts

Author, game designer, technologist, teacher.

Helped to create the CVE and many other things. Fixed autorun for XP. On Blackhat Review board.

Books include Threats: What Every Engineer Should Learn from Star Wars (2023), Threat Modeling: Designing for Security, and The New School of Information Security.

Following back if you have content.

Websitehttps://shostack.org
Latest bookhttps://threatsbook.com
Opsec statusCurrently clean
Youtubehttps://youtube.com/shostack
Why is it so hard to find catalogs of defenses that are organized by "what's useful to engineers building products or systems?"

I have a new blog post with some observations from delivering a class that included threat modeling with LLMs:

https://shostack.org/blog/lessons-from-owasp/

Shostack + Friends Blog > Lessons from Threat Modeling Intensive With AI

Actionable lessons from delivering Threat Modeling with AI, and using AI more generally.

We use AI in our own work and talk with clients about how they're using it in theirs. This week's post shares what we've learned from our day-to-day experiments, discussions, and classroom experiences. If the intersection of threat modeling and AI is on your mind, we've been doing some work there worth reading about. https://shostack.org/blog/lessons-from-owasp/
Shostack + Friends Blog > Lessons from Threat Modeling Intensive With AI

Actionable lessons from delivering Threat Modeling with AI, and using AI more generally.

(cc @andrewnez ; I was thinking of your 'stars are a game' comments.)

@bagder You know that's not completely true. They may have happened, yes.

But with a finite time to write and review the code, all the time taken focusing on avoiding "C mistakes" could have been used focusing of the logic instead.

(Also, languages with stronger types do provide states and types encapsulation that can help avoid some types of logic mistakes)

*Zero* out of the six pending #curl CVEs are C mistakes. They are all logical mistakes that would have happened anyway even if we had used another language.

Reading the opening sentence of this https://www.strikegraph.com/blog/the-mercor-breach-exposed-silicon-valleys-fragile-ai-supply-chain

"an open-source AI gateway downloaded 95 million times per month," I'm forced to wonder: Are download counts a game?

also, did, https://openssf.org/blog/2025/09/23/open-infrastructure-is-not-free-a-joint-statement-on-sustainable-stewardship/ have any impact?

The Mercor breach exposed Silicon Valley's fragile AI supply chain

The Mercor breach highlights vulnerabilities in Silicon Valley's AI supply chain, exposing compliance failures and significant data risks.

Day two of Black Hat and I got a chance to see my friend Ariel Herbert-Voss
do the keynote. It opened a floodgate of thoughts.

Oh and yay, GPT-5.5 is here and it feels like we’re entering another mad period of growth for frontier models and security research.

The big thing I’m seeing is that we need less scaffolding around these models. Give them code, context, a goal and some tools, and they are getting much better at cracking on.

That matters for bug hunting. A lot

But let’s not pretend the machines have solved vuln research. They haven’t.

The biggest gains are still at the shallow end.

Low severity bugs, obvious logic mistakes, unsafe patterns, missing checks, boring-but-real issues, that’s where models are starting to clean up. If the bug class is well documented and the code is clear, they move quickly.

The low-hanging fruit is getting hoovered up at pace. The easy stuff is becoming cheaper to find, well sorta cheaper.

We’re also seeing decent gains on modest bugs. Not deep chains. Not always novel research. But useful findings where the model can read, reason, trace, and join enough dots to help.

Where it still gets spicy is state.

Models can talk about state all day, but they don’t really feel it. They still struggle with temporal bugs, race conditions, lifecycle weirdness, multi-step flows, and those “only happens after you do these seven things in this exact order” bugs.

Spotting something dodgy is not the same as proving it is exploitable.

On the exploit side, validation, exploitability and reliability are improving, but more slowly. This is one of the big areas John and I have been working through with RAPTOR: getting away from “looks interesting, mate” towards “this is real, reachable, and repeatable”.

Because exploit reliability is still a graft. Targeting is still fragile. You still need iterations.

Oh and the human in the loop is still vital.

We all thought fuzzing would solve the bug problem. It didn’t. It changed the economics of bug discovery, but we still needed harnesses, triage, context, exploit dev, judgement, and all the boring engineering bits that make the work useful.

I think frontier models are having a similar moment.

The best results won’t come from throwing a giant model at a repo and hoping it finds magic. They’ll come from layered systems: frontier models for reasoning and code understanding, smaller focused models trained on private data, internal vuln history, remediation patterns, product context, validation loops, and humans who know when the model is chatting crap.

As models get better at writing code, they get better at breaking it. Capability doesn’t scale politely. It compounds.

Better tool use helps validation.

Better context helps reachability.

Better reasoning helps exploit chains.

But it still needs structure.

It still needs evidence.

It still needs humans who know what good looks like.

The shallow bugs are already getting compressed. The interesting bit is what comes next.

How’s your evening going?

Mine is … watching @1Password and Apple password fight over who gets to login to the Alaska app to the point of locking my account and I don’t even remember why I last had to change my password and now I remember why everyone hates security.

Should I stop rating talks when I put stuff like this in:

Pros: momentarily thought provoking.
Cons: The thoughts provoked are not very polite.