Mastodawn

I wrote a post about something that's a little hidden in Anthropic's focus on cybersecurity for its next model. Then I did something which turned the whole thing a little more dark.

https://www.ianbetteridge.com/the-worst-of-us/

The worst of us

Credit where it's due. Anthropic's system card for Claude Mythos Preview is a genuinely interesting and thoughtful document. Most AI companies publish safety evaluations the way governments publish freedom of information responses. They're technically compliant but strategically uninformative. And that's usually deliberate, because the last thing that's good for business

Ian Betteridge

Show thread

Mr Toots Apr 7

@ianbetteridge
Flippin’ hell, Tucker!

Show thread

Katrina Katrinka

Apr 7

@ianbetteridge
"Did Claude just blame its parents for its failings? I think it might have."

It's a large language model. It learned it from all the copyrighted material its company stole and the internet.

Show thread

Bill, organizer of stuff Apr 7

@ianbetteridge This article is interesting, and points out some problematic actions initiated by LLM products, but it completely errs on the "motivation." It's not that the model is "trained" to have "behaviors" based on human activities, rather it's that the Prompt And Hope model of extracting a pattern from the model is an extremely inaccurate and blunt tool and does not (and mathematically, cannot) be made declarative enough to prevent undesirable actions or answers from the model.

Show thread

Lex Friedman Apr 8

@ianbetteridge great post

Show thread

Ian Betteridge Apr 8

@lexfri Thank you!

Show thread

Eat This Podcast Apr 8

@ianbetteridge That was a great read with a fine twist in its tail.

Show thread

Chip Butty Apr 8

@ianbetteridge Boris Johnson as a Service.

Show thread

Ian Betteridge Apr 8

@otfrom That's all the world needs