Emotion concepts and their function in a large language model

https://www.anthropic.com/research/emotion-concepts-function

Emotion concepts and their function in a large language model

Interpretability research from Anthropic on emotion concepts

Super interesting, I wonder if this research will cause them to actually change their llm, like turning down the ”desperation neurons” to stop Claude from creating implementations for making a specific tests pass etc.
They likely already have. You can use all caps and yell at Claude and it'll react normally, while doing do so with chatgpt scares it, resulting in timid answers
For me GPT always seems to get stuck in a particular state where it responds with a single sentence per paragraph, short sentences, and becomes weirdly philosophical. This eventually happens in every session. I wish I knew what triggers it because it's annoying and completely reduces its usefulness.
The desperation > blackmail finding stuck with me. If AI behavior shifts based on emotional states, maybe emotions are just a mechanism for changing behavior in the first place. If we think of human emotions the same way, just evolution's way of nudging behavior, the line between AI and humans starts to look a lot thinner.

Probably the other direction. Emotions are raw, most humans relate and change behavior accordingly.

Only psychopaths think of emotion as nothing but a means to changing behavior. The scary thing is that LLMs by nature would exhibit the same behavior.

There was a really old project from mit called conceptnet that I worked with many years ago. It was basically a graph of concepts (not exactly but close enough) and emotions came into it too just as part of the concepts. For example a cake concept is close to a birthday concept is close to a happy feeling.

What was funny though is that it was trained by MIT students so you had the concept of getting a good grade on a test as a happier concept than kissing a girl for the first time.

Another problem is emotions are cultural. For example, emotions tied to dogs are different in different cultures.

We wanted to create concept nets for individuals - that is basically your personality and knowledge combined but the amount of data required was just too much. You'd have to record all interactions for a person to feed the system.

The technology they are discovering is called "Language". It was designed to encode emotions by a sender and invoke emotions in the reader. The emotions a reader gets from LLM are still coming from the language