Mastodawn

The AI slop security reporting is basically extinct. It almost does not happen anymore. At all.

I want to emphasize this because when I talk about AI security reports now, half my readers seem to believe those are AI slop. They're not. They are found with AI tools and normally high quality bug reports.

The weakest part is that they tend to overstress the vulnerability angle. Lots of them are well phrased bug reports that are still "just bugs".

Show thread

Kevin Boyd (he/him) 🇨🇦13h ago

@bagder Yeah, seems like around january things flipped around.

I was hoping the slop would continue to be slop, but alas. Wishful thinking on my part (to make it easier to disregard the fad).

Show thread

Tom Schuster 13h ago

@bagder The other problem with AI bug reports is the verbosity, otherwise I basically agree.

Show thread

daniel:// stenberg://13h ago

@evilpie true they are normally way too talkative

Show thread

Richard Hughes 13h ago

@bagder I get this with fwupd too. Everything that's AI found is reported as a CVSS 10.0 CRITICAL vulnerability, and then you find out it's assuming the attacker has write access on /etc or something dumb like that.

At that point it's just a regular old typo bugfix like all the other thousands of unimportant commits.

Show thread

Utopiah (Fabien Benetou)13h ago

@bagder "they tend to overstress the vulnerability angle." which I imagine is simply because that's what the prompt suggested.

Show thread

daniel:// stenberg://13h ago

@utopiah probably, but also because the AIs can't really tell

Show thread

Utopiah (Fabien Benetou)10h ago

@bagder sure, ironically enough there is no "I" in AI.

Show thread

Marcos Dione 9h ago

@utopiah @bagder there's no irony at all, it's at minimum a marketing strategy.

Show thread

LangerJan 12h ago

@bagder Well, I guess you could quickly convince them otherwise with your "reports/ai-slop ratio" graph.

Show thread

Steve Loughran 12h ago

@bagder I see
- good ones using AI as part of a rigorous process with replication
- mediocre where someone asked an AI "Find me a CVE", submits the report without review or replication, and yet still expects credit

If "have write access to the filesystem" is a prerequisite to an exploit: it's not an exploit. You already have total ownership of the server

Show thread

Alesandro Ortiz 🇵🇷🏳️‍🌈11h ago

@bagder Do reporters share the tools used, or are there strong tool indicators in the reports?

Curious about which tool(s) are most successful, at least for cURL research.

I imagine in most cases reporters don't mention the tools used (especially if custom), which is unfortunate.

Show thread

Mike Fiedler, Code Gardener 9h ago

@bagder you're lucky. I got 30+ yesterday. 1 was kind of credible. The others were effectively documented behaviors of projects.
There's still little to no consequences for wasting time - I've been thinking about the "name and shame" approach you have, maybe that helps change the behavior?

Show thread

Nicolás Alvarez 7h ago

@bagder I wonder how much of that is because you eliminated the bounty

Show thread

Guillaume Rossolini 4h ago

@nicolas17 looks like Daniel answered this here
https://xoxo.zone/@annika/116407401083742143

Show thread

flpvsk 13h ago

@bagder as in all AI security reporting doesn't happen? Or just the low quality reporting?

Show thread

daniel:// stenberg://13h ago

@flpvsk they're close to 100% AI now. High quality

Show thread

Brian 13h ago

@bagder @flpvsk Mythos?

Show thread

Kevin Boyd (he/him) 🇨🇦13h ago

@bagder @flpvsk do you know which specific tools/models they come from?

Show thread

Christophe B.

2h ago

@bagder Are they still overly polite?

Show thread

Annika Backstrom 13h ago

@bagder What do you think changed? Better tools? Stopping the bug bounty?

Show thread

daniel:// stenberg://13h ago

@annika the tooling for sure, nothing else

Show thread

Jose 13h ago

@bagder @annika What was the total time between “this slop is a problem” and “this stuff is pretty good”?

Show thread

grayrattus 13h ago

@j_s_j @bagder @annika month.

https://red.anthropic.com/2026/mythos-preview/

Here you can read more.

Claude Mythos Preview \ red.anthropic.com

Show thread

Nicolás Alvarez 2h ago

@grayrattus @j_s_j @bagder @annika Mythos isn't even public yet so that can't be the reason.

Show thread

Jose 2h ago

@nicolas17 Sure it could. curl ships with almost everything, so it’s not unreasonable to think one of the blessed entities with Mythos access scanned for vulnerabilities

Show thread

Nicolás Alvarez 2h ago

@j_s_j And people without Mythos access stopped reporting bugs altogether?

Show thread

Jose 2h ago

@nicolas17 My bad. You’re right

Show thread

grayrattus 31m ago

@nicolas17 @j_s_j well I can imagine that expensive AI models really got better. This new one is just perfect example byt in general LLMs changed a lot at the end of previous year.

I have to use Claude at work and it really boosts productivity. It wont code whole project for you but if you know what you are doing these tools really speed up the work.

Show thread

Aedius Filmania ⚙️🎮🖊️13h ago

@bagder @annika

I assume that they also used your free work to create the prompt that refuse a lot of bad report internaly.

Show thread

raboof 13h ago

@bagder I wish this was my experience 😆. But it's certainly getting better.

Show thread

grayrattus 13h ago

@bagder I love how you changed your opinion on this topic when you saw real evidence in form of good security reports written by AI.

If someone would write this 2 years ago I would say they are delusional but today its just reality.

I hope soon we get open models with such capabilities as for now only the gatekeeped models from big tech are capable of doing such good work.

#LLMs #genai #anthropic

Show thread

daniel:// stenberg://13h ago

@grayrattus it was never my opinion as much as my summary of the situation... and the situation has changed quite drastically

Show thread

grayrattus 13h ago

@bagder yeah. Sorry. More like summary of the situation.

Show thread

🪨13h ago

@bagder Didn't you share one just 2 days ago though? hackerone.com/reports/3669305

curl disclosed on HackerOne: Argument Injection via curl Short-Flag...

This report details how the curl -os command facilitates an Argument Injection vulnerability in applications that wrap the curl command-line tool. The specific command curl -os /etc/passwd --url http://example.com demonstrates a subtle but dangerous behavior. Because -s (silent) follows -o (output), curl expects the very next string to be the filename.In this scenario:The -o flag consumes the...

HackerOne

Show thread

Francesco Degrassi 12h ago

@Varpie @bagder 90% of the time it works every time. It probably improved dramatically, but still slop lingers?

Show thread

Louis 12h ago

@bagder Can't wait for your next graph 🤓

Show thread

Mark Dominus 10h ago

@pozorvlak To me, the most interesting part of that thread was this post.

This person considers AI their enemy. But not because it is wasting Stenberg's time. They wanted it to continue to waste Stenberg's time, so that they could continue to hate it more.

Show thread

Mark Dominus 10h ago

@pozorvlak Now I think a more reasonable interpretation is: they are concerned about copyright violations, environmental damage, etc., and are dismayed that people like me use AI anyway. The fact of its getting better doesn't fix the other problems, and just means that there are fewer arguments against using it.

(“This is terrible” vs. “This is terrible, maybe when people realise that it doesn't work, they will stop.”)

Show thread

Pozorvlak 10h ago

@mjd I think so. But also, if all AI-generated bug reports are useless, you can stop reading as soon as you've decided a bug report came from an AI.

Show thread

Mark Dominus 10h ago

@pozorvlak If that were the reason, wouldn't they want the reports to be as good as possible, and be glad if the reports were all worth reading? But this person says they are disappointed!

Show thread

Pozorvlak 9h ago

@mjd ah, good point. Reliably bad reports waste a small amount of time, but more than zero. The worst case is reports that are only sometimes good, because then you have to read them all carefully.

Show thread

ori 9h ago

Yes, it would be nice if we stopped building hell so people can roast a few marshmallows. Marshmallows are nice, but not that nice.

CC: @[email protected]

Show thread

Ben 6h ago

@[email protected] @[email protected] I mean, it’s terrible for the environment, has loads of ethical and moral concerns, and the companies are completely unsustainable. It’s pretty easy to hate

Show thread

Banda Bassotti 8h ago

I wonder how much of that is the tools getting better versus not paying out bounties anymore.

Show thread

David Lord

5h ago

@bagder Unfortunately that hasn't made it to Flask yet, we still get a bunch of AI slop. About 50 reports so far this year, none helpful. Typically we get < 10 per year, some helpful.

Show thread

Stephanie 4h ago

@bagder Seems like all you need to do is take away the incentive to get rid of the low effort reports.

Sad they had to ruin it for real reporters now as they don’t get their (deserved) bounty anymore in exchange for the good work they’re doing.