Anecdotal, small sample size, but recently Opus 4.8 just generated something very stupid, where as previously it was mostly okay (at least for intermediate things) and Fable 5 seemed totally sharp. (At least on par with Opus 4.8)

Are they degrading Opus to then make a future Fable seem actually better (after it's been possibly weakened?)

At the bare minimum we need some real laws that prevent companies from crippling products without telling us. Maybe there's already legal precedent for this.

What if "AI safety" to the U.S. government, is really "can it find all the bug/backdoors we're relying on" ?

What if the retraction is to give Anthropic time to add more safe guards to researchers discovering 0days that are being used in operational missions?

There's lots of problems (eg: stolen copyrights) with these models, but their ability to find bugs is kind of awesome.

@purpleidea I‘m hearing such mixed things. I think the curl guy was unimpressed when comparing LLM output to a collection of classic approaches.
@felixf Can you elaborate what "classic approaches" means?
@purpleidea I believe they run a whole range of static analysis tools, fuzzers etc

@felixf I am not making an argument that you shouldn't use those tools! If anything LLM's made it trivial to add new tests that I never got around to writing.

(And you should probably avoid C too!)

@purpleidea Of course, just throwing AI at the problem is ostensibly more accessible. But at what cost