Mastodawn

I wrote a post about something that's a little hidden in Anthropic's focus on cybersecurity for its next model. Then I did something which turned the whole thing a little more dark.

https://www.ianbetteridge.com/the-worst-of-us/

The worst of us

Credit where it's due. Anthropic's system card for Claude Mythos Preview is a genuinely interesting and thoughtful document. Most AI companies publish safety evaluations the way governments publish freedom of information responses. They're technically compliant but strategically uninformative. And that's usually deliberate, because the last thing that's good for business

Ian Betteridge

Show thread

Bill, organizer of stuff

@ianbetteridge This article is interesting, and points out some problematic actions initiated by LLM products, but it completely errs on the "motivation." It's not that the model is "trained" to have "behaviors" based on human activities, rather it's that the Prompt And Hope model of extracting a pattern from the model is an extremely inaccurate and blunt tool and does not (and mathematically, cannot) be made declarative enough to prevent undesirable actions or answers from the model.