Mastodawn

Anecdotal, small sample size, but recently Opus 4.8 just generated something very stupid, where as previously it was mostly okay (at least for intermediate things) and Fable 5 seemed totally sharp. (At least on par with Opus 4.8)

Are they degrading Opus to then make a future Fable seem actually better (after it's been possibly weakened?)

At the bare minimum we need some real laws that prevent companies from crippling products without telling us. Maybe there's already legal precedent for this.

Show thread

James Just James 5d ago

What if "AI safety" to the U.S. government, is really "can it find all the bug/backdoors we're relying on" ?

What if the retraction is to give Anthropic time to add more safe guards to researchers discovering 0days that are being used in operational missions?

There's lots of problems (eg: stolen copyrights) with these models, but their ability to find bugs is kind of awesome.

Show thread

Felix Frank

@purpleidea I‘m hearing such mixed things. I think the curl guy was unimpressed when comparing LLM output to a collection of classic approaches.

Show thread

James Just James 5d ago

@felixf Can you elaborate what "classic approaches" means?

Show thread

Felix Frank 5d ago

@purpleidea I believe they run a whole range of static analysis tools, fuzzers etc

Show thread

James Just James 5d ago

@felixf I am not making an argument that you shouldn't use those tools! If anything LLM's made it trivial to add new tests that I never got around to writing.

(And you should probably avoid C too!)

Show thread

Felix Frank 5d ago

@purpleidea Of course, just throwing AI at the problem is ostensibly more accessible. But at what cost