The AI slop security reporting is basically extinct. It almost does not happen anymore. At all.

I want to emphasize this because when I talk about AI security reports now, half my readers seem to believe those are AI slop. They're not. They are found with AI tools and normally high quality bug reports.

The weakest part is that they tend to overstress the vulnerability angle. Lots of them are well phrased bug reports that are still "just bugs".

@bagder Yeah, seems like around january things flipped around.

I was hoping the slop would continue to be slop, but alas. Wishful thinking on my part (to make it easier to disregard the fad).

@bagder The other problem with AI bug reports is the verbosity, otherwise I basically agree.
@evilpie true they are normally way too talkative

@bagder I get this with fwupd too. Everything that's AI found is reported as a CVSS 10.0 CRITICAL vulnerability, and then you find out it's assuming the attacker has write access on /etc or something dumb like that.

At that point it's just a regular old typo bugfix like all the other thousands of unimportant commits.

@bagder "they tend to overstress the vulnerability angle." which I imagine is simply because that's what the prompt suggested.
@utopiah probably, but also because the AIs can't really tell
@bagder sure, ironically enough there is no "I" in AI.
@utopiah @bagder there's no irony at all, it's at minimum a marketing strategy.
@bagder Well, I guess you could quickly convince them otherwise with your "reports/ai-slop ratio" graph.

@bagder I see
- good ones using AI as part of a rigorous process with replication
- mediocre where someone asked an AI "Find me a CVE", submits the report without review or replication, and yet still expects credit

If "have write access to the filesystem" is a prerequisite to an exploit: it's not an exploit. You already have total ownership of the server

@bagder Do reporters share the tools used, or are there strong tool indicators in the reports?

Curious about which tool(s) are most successful, at least for cURL research.

I imagine in most cases reporters don't mention the tools used (especially if custom), which is unfortunate.

@bagder you're lucky. I got 30+ yesterday. 1 was kind of credible. The others were effectively documented behaviors of projects.
There's still little to no consequences for wasting time - I've been thinking about the "name and shame" approach you have, maybe that helps change the behavior?
@bagder I wonder how much of that is because you eliminated the bounty
@bagder as in all AI security reporting doesn't happen? Or just the low quality reporting?
@flpvsk they're close to 100% AI now. High quality
@bagder @flpvsk do you know which specific tools/models they come from?
@bagder Are they still overly polite?
@bagder What do you think changed? Better tools? Stopping the bug bounty?
@annika the tooling for sure, nothing else
@bagder @annika What was the total time between “this slop is a problem” and “this stuff is pretty good”?
Claude Mythos Preview \ red.anthropic.com

@grayrattus @j_s_j @bagder @annika Mythos isn't even public yet so that can't be the reason.
@nicolas17 Sure it could. curl ships with almost everything, so it’s not unreasonable to think one of the blessed entities with Mythos access scanned for vulnerabilities
@j_s_j And people without Mythos access stopped reporting bugs altogether?
@nicolas17 My bad. You’re right

@nicolas17 @j_s_j well I can imagine that expensive AI models really got better. This new one is just perfect example byt in general LLMs changed a lot at the end of previous year.

I have to use Claude at work and it really boosts productivity. It wont code whole project for you but if you know what you are doing these tools really speed up the work.

@bagder @annika

I assume that they also used your free work to create the prompt that refuse a lot of bad report internaly.

@bagder I wish this was my experience 😆. But it's certainly getting better.

@bagder I love how you changed your opinion on this topic when you saw real evidence in form of good security reports written by AI.

If someone would write this 2 years ago I would say they are delusional but today its just reality.

I hope soon we get open models with such capabilities as for now only the gatekeeped models from big tech are capable of doing such good work.

#LLMs #genai #anthropic

@grayrattus it was never my opinion as much as my summary of the situation... and the situation has changed quite drastically
@bagder yeah. Sorry. More like summary of the situation.
@bagder Didn't you share one just 2 days ago though? hackerone.com/reports/3669305
curl disclosed on HackerOne: Argument Injection via curl Short-Flag...

This report details how the curl -os command facilitates an Argument Injection vulnerability in applications that wrap the curl command-line tool. The specific command curl -os /etc/passwd --url http://example.com demonstrates a subtle but dangerous behavior. Because -s (silent) follows -o (output), curl expects the very next string to be the filename.In this scenario:The -o flag consumes the...

HackerOne
@Varpie @bagder 90% of the time it works every time. It probably improved dramatically, but still slop lingers?
@bagder Can't wait for your next graph 🤓

@pozorvlak To me, the most interesting part of that thread was this post.

This person considers AI their enemy. But not because it is wasting Stenberg's time. They wanted it to continue to waste Stenberg's time, so that they could continue to hate it more.

@pozorvlak Now I think a more reasonable interpretation is: they are concerned about copyright violations, environmental damage, etc., and are dismayed that people like me use AI anyway. The fact of its getting better doesn't fix the other problems, and just means that there are fewer arguments against using it.

(“This is terrible” vs. “This is terrible, maybe when people realise that it doesn't work, they will stop.”)

@mjd I think so. But also, if all AI-generated bug reports are useless, you can stop reading as soon as you've decided a bug report came from an AI.
@pozorvlak If that were the reason, wouldn't they want the reports to be as good as possible, and be glad if the reports were all worth reading? But this person says they are disappointed!
@mjd ah, good point. Reliably bad reports waste a small amount of time, but more than zero. The worst case is reports that are only sometimes good, because then you have to read them all carefully.
Yes, it would be nice if we stopped building hell so people can roast a few marshmallows. Marshmallows are nice, but not that nice.

CC: @[email protected]
@[email protected] @[email protected] I mean, it’s terrible for the environment, has loads of ethical and moral concerns, and the companies are completely unsustainable. It’s pretty easy to hate
I wonder how much of that is the tools getting better versus not paying out bounties anymore.
@bagder Unfortunately that hasn't made it to Flask yet, we still get a bunch of AI slop. About 50 reports so far this year, none helpful. Typically we get < 10 per year, some helpful.

@bagder Seems like all you need to do is take away the incentive to get rid of the low effort reports.

Sad they had to ruin it for real reporters now as they don’t get their (deserved) bounty anymore in exchange for the good work they’re doing.