Mastodawn

“Twelve months ago, we'd have rejected out of hand the idea of granting Claude access sufficient to take down an internal Anthropic service. Today that level of access is routine.” https://www.anthropic.com/engineering/how-we-contain-claude

Show thread

crouton Jun 4

@joeycastillo the language of containment and isolation has me visualising Claude as a xenomorph from the Alien franchise. You may be able to get some utility of it, but you know how the story is gonna end.

Show thread

felix (grayscale) 🐺Jun 4

@joeycastillo I like this parenthetical:
> (When we shared the working prompt in internal Slack for discussion, someone pointed out that some internal agents read Slack. The payload was now ambient. We added a canary string to the thread so we'd notice if anything picked it up. In a world where agents read everything, the investigation tooling is also an attack surface.)