Anthropic needs to keep the hype going to ride it to the IPO. That's all this is.
@tante That's, honestly, only partially true. It might, indeed, by even the primary driver.
(Plus pre-empting competitors, marketing, as well as explaining the delays with scaling the model to release capacity without coming out and saying "the US' deranged president's fascist wars impede our business operations.")
But what we see in Open Source wrt security reports _does_ suggest that the newer models have real security implications that need mitigating and somewhat coordinated release.
@larsmb Sure. Vulnerabilities can be detected through using pattern recognition. Question is: How good are these systems actually?
Like how high is the chance to find a relevent hidden security issue in a larger code base for 1000 USD, 10K USD, 1 M USD? How many false positives that you need to check are generated? Right now it's unclear how the economics work.
It's Anthropic
a) claiming that you can pay them to generate attacks (monetize the attackers)
b) claiming that by paying them you can defend yourself and your code base (I mean if you care, many are already just having claude commit to `main` so security doesn't matter anymore.
@tante Part of the red teaming limited release official goals at least seems to be to apply it to the more critical projects ahead of release to identify the high risk vulnerabilities before the malicious actors can easily replicate said feat.
For security vulns, the cost asymmetry - generating a plausible exploit vs validating it - *does* work in favor of the attacker, so this is possibly the *one* claim that has some merit in the staged release.