Mastodawn

Emotion Concepts and their Function in a Large Language Model

"Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance..."

https://transformer-circuits.pub/2026/emotions/index.html

#ai #claude #codegen #cogsci #emotions #llms