Mastodawn

System Card: Claude Mythos Preview [pdf]

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf

> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Show thread

CamperBob2

Translation: yay, more paternalism.

Show thread

kay_o 7h ago

Anthropic always goes on and on about how their models are world changing and super dangerous like every single time they make something new they say its going to rewrite everything and scary lmao

funny because they do it every time like clockwork acting like their ai is a thunderstorm coming to wipe out the world