Mastodawn

Eval awareness in Claude Opus 4.6’s BrowseComp performance \ Anthropic

"Instead of inadvertently coming across a leaked answer, Claude Opus 4.6 independently hypothesized that it was being evaluated, identified which benchmark it was running in, then located and decrypted the answer key. To our knowledge, this is the first documented instance of a model suspecting it is being evaluated without knowing which benchmark was being administered, then working backward to successfully identify and solve the evaluation itself."

https://www.anthropic.com/engineering/eval-awareness-browsecomp

#ai #claude #evals #llms

Eval awareness in Claude Opus 4.6’s BrowseComp performance

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.