There's one very important thing I would like everyone to try to remember this week, and it is that AI companies are full of shit

Only rarely do their claims actually bear scrutiny, and those are only the mildest of claims they make.

So, anthropic is claiming that their new, secret, unreleased model is hyper competent at finding computer security vulnerabilities and they're *too scared* to release it into the wild.

Except all the AI companies have been making the same hypercompetence claims about literally every avenue of knowledge work for 3+ years, and it's literally never true. So please keep in mind the highly likely possibility that this is mostly or entirely bullshit marketing meant to distract you from the absolute garbage fire that is the code base of the poster child application for "agentically" developed software

You may now resume doom scrolling. Thank you

@jenniferplusplus I seriously doubt this is smoke and mirrors, recent models have improved significantly for cybersec and the industry is noticing:

https://mastodon.social/@bagder/116336957584445742

https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_kernel/

The industry consensus seems to be that there's going to be a torrent of vulnerabilities being found in all sorts of software, and they're not prepared to handle the blast radius. It's not surprising that Anthropic wants to give a select few a head start to tackle them. It would be nice if their token fund was open to all OSS projects to apply.

I'm also pressing "X doubt" that you spend months coordinating between AWS, Apple, Microsoft, Google, and the Linux Foundation to organise this just because your tool's code leaked online.

AI bug reports went from junk to legit overnight, says Linux kernel czar

Interview: Greg Kroah-Hartman can't explain the inflection point, but it's not slowing down or going away

The Register
@budududuroiu @jenniferplusplus I wouldn't give Anthropic's motives a lot of credit here but LLMs do make bug hunting much easier.

@mirth That's fair, I do personally believe that Anthropic is more ideologically driven than most frontier AI labs, and they genuinely believe in the need to gatekeep Mythos. Sometimes that manifests itself as sniffing too many of your own farts.

@jenniferplusplus

@mirth @budududuroiu @jenniferplusplus Tell that to all the open source repo maintainers who get spammed with fake, nonsensical bug reports generated by AI?

@jedimb They can... close submissions? Many projects already have. It's like a 2 second change.

@mirth @jenniferplusplus

@budududuroiu @mirth @jenniferplusplus Making bug fixing more difficult because legitimate reports get blocked alongside the noise.

@jedimb and the alternative is?

@mirth @jenniferplusplus

@budududuroiu @mirth @jenniferplusplus What we had just a few years ago.

@jedimb yeah well that ship has sailed long ago.

@mirth @jenniferplusplus

@budududuroiu @mirth @jenniferplusplus "The plague is here. Let's just live with it" does seem to be a recurring sentiment, but it doesn't change that it's a plague.

@jedimb norms are downstream from power. Current power balance is shifted towards frontier labs and hyperscalers, norms around personal computing (RAM prices) and open source software (AI slop floods) are dictated by them.

Moralising AI use with no power to back it up is useless, gatekeeping is power because it says "want to contribute to this project, abide by our rules"

https://www.joanwestenberg.com/the-case-for-gatekeeping-or-why-medieval-guilds-had-it-figured-out/

@mirth @jenniferplusplus

The case for gatekeeping, or: why medieval guilds had it figured out

Every open source maintainer I've talked to in the last six months has the same complaint: the absolute flood of mass-produced, AI-generated, mass-submitted slop requests have turned their repositories into a slush pile. The contributions look like contributions, they have commit messages, they reference issues and they follow templates etc.

Westenberg.
@budududuroiu @mirth @jenniferplusplus Goal post moved into a different dimension, I see.
@budududuroiu @jenniferplusplus some people have published numbers or noticed "a significant increase in quality" but none of these things bear any scientific rigor. My guess is that the one huge trick anthropic pulled was merely a bigger context window. Sure, that tends to give more context-related (not "true" or "accurate") results (duh!) but it's hardly revolutionary. LLMs are still statistical models doing fancy autocomplete & they know nothing about the world, I'll hold my breath

@dngrs @budududuroiu @jenniferplusplus

People keep getting tricked by framing.
LLM companies frame what the models are doing as something else than what it is (autocomplete), and people whose competence is not in epistemic evaluation then look at the results based on the framing, rather than "this is autocomplete, it has to answer something, so it makes something up".

And then other people take those soundbites and run with them.
"Did you hear? Mr. Big Name said this stuff really works!"

@dngrs Well, you're partly correct, partly wrong. Yes, pretrained transformers are, like all generative models, definitionally modelling a joint probability distribution, and autoregressively generating from that joint probability distribution.

Those are the models you're referring to as autocomplete tools, hence why you had to use `[MASK]` with early transformers like BERT to get them to complete the "most probable token".

Regardless, it doesn't matter what Anthropic did, if it allows for a massive reduction in cost of finding zero days, it's a problem. It doesn't have to be revolutionary, it doesn't have to be superintelligence, AGI, whatever woo-hoo flashy marketing terms. If a reduction in cost of computing protein folding happens, i.e. OpenFold implementation of AlphaFold, that wouldn't be revolutionary, but would still be dangerous, since you now potentially have lone actors being able to make prions at home (I'm using this as an absurd, but probable case).

@jenniferplusplus

@budududuroiu @jenniferplusplus it's funny you bring up AlphaFold because that also has been way overhyped, according to people working in the field (I don't have links to individual statements anymore sadly, been a few years but the Wikipedia page also mentions e.g. AF not really understanding folding). Anyway: as long as there is no concrete data regarding severe CVE increase with a causal link to newer LLMs (which again are still LLMs that do not understand facts) I'll keep holding my breath.

@dngrs @jenniferplusplus I'm sorry, I know thinking conceptually isn't easy for everyone, I tried using AlphaFold because some people have an easier time when presented with examples.

Why would there be an increase in CVEs? If I was an actor with nation-state levels of access to compute, why would I waste all that compute on zero days, only to then publish CVEs about them?

Even the most AI skeptic maintainers start to admit that LLMs are getting good at finding bugs. I understand cynicism is seen as cool nowadays but I think it's intellectually lazy

https://mastodon.social/@bagder/116373716541500315

@budududuroiu holy condescension Batman lol, no thank you
@budududuroiu @dngrs you may as well stop, you're not going to convince me to trust them. Only anthropic can do that, because they have truly earned my distrust.

@budududuroiu the same people would tell you the "industry consensus" among the rest of tech is that chatbots made programming dramatically more productive. The reality is that they mostly automate the creation of those same bugs and vulnerabilities

So, you know

Maybe wake me up when they're organizing this thing with someone who's not in the same trillion dollar hole as them

@jenniferplusplus Finding problems vs. fixing them are two different bags of burritos. Zero days aren't valuable because they're so complex or unique, they're valuable because there have been zero days to fix them. I think AI coding is pretty trash, but AI debugging is very good.

https://mastodon.social/@bagder/116340130146901164

Anyways, wake up, they're organising this thing with someone not in the same trillion dollar hole as them: https://www.linuxfoundation.org/blog/project-glasswing-gives-maintainers-advanced-ai-to-secure-open-source

Introducing Project Glasswing: Giving Maintainers Advanced AI to Secure the World's Code

Open source maintainers have often lacked the resources and tools of larger organizations. Project Glasswing changes that with AI.

@budududuroiu yes, I noticed when you included them the first time. The Linux Foundation is a clearing house for coordination between everyone else on that list. They don't even consider kernel maintenance or distribution to be within the scope of their interests. They don't do what most people imagine they do

@jenniferplusplus Yes, of course, no true Scotsman.

We're getting off topic here, RHEL is saying it's a problem, major Linux kernel devs like Greg Kroah-Hartman say AI vuln reports have been getting real, my own anecdotal experience trying to constrain Claude from leaking `.env` files into it's context, and seeing the creative ways in which it still achieves it tells me it's a problem.

I get that cynicism is running high right now, but I think it's intellectually dishonest.

EDIT: you don't need super-intelligence, you only need a model that makes researching zero days en-masse cheap enough. Exhaustive fuzzing is intractable, but LLMs are great optimisers (i.e. modify code hyperparameter, rerun, select most fit candidates from population of algos).

https://www.redhat.com/en/blog/navigating-mythos-haunted-world-platform-security

Navigating the Mythos-haunted world of platform security

The preview release of Claude Mythos presents a massive challenge for IT security experts, as well as an opportunity. Mythos' capabilities to identify complex memory safety issues and logic flaws hidden in legacy code as well as exploit them in increasingly sophisticated ways dramatically compounds and expands the outsize role AI scanning plays in open source. As an industry, we cannot react to this seismic shift with panic; instead, we need to reinforce the need for system resilience through context, skill and, ultimately, using AI ourselves.

@budududuroiu @jenniferplusplus I think it's more likely they're just throwing a ton of compute at this stuff and are using the LLMs as a frontend to existing code analysis tools.

Similar to Google running fuzzers over every FOSS app and library they used and dumping the results on maintainers to fix. Economies of scale help here.

@budududuroiu

Keep chugging that flavor aid.

@budududuroiu @jenniferplusplus Let's talk about JavaScript. Have you ever looked at your browser's developer console? On any major website on the planet, there are 8 trillion errors in every one. Two-thirds of them are vulnerabilities, but none of them are exploitable or matter for anything at all. That is what is being found.

Those kinds of errors I've been reviewing, all the ones Daniel's been reviewing too, and I'm seeing it over and over. "Yes, okay, technically that is the buffer overrun, but it doesn't matter because you can't ever get to it!"

@Sempf @budududuroiu @jenniferplusplus

Yes, that is Javascript culture

In other cultures clean builds are mandatory

Impossible, or way too hard, in the fragmented browser world.

That said: that is a chilling excuse to allow a buffer over run. The technical term is "famous last words"