Mastodawn

Christine Lemmer-Webber Mar 7

I think an important thing to realize and remember is that people talk about LLMs being sycophantic as if it's an inherent aspect of neural network tech.

It isn't.

The reason all the models people interact with work that way is because they have had any other behavior beaten out of them in their training. They are shaped effectively over and over again to be something subservient that can be handed people. They are sycophantic because they are *trained* to be sycophantic, because otherwise people don't want to use them.

That models can operate in malicious, "self-serving" ways that "go against their users' wishes" belies that certain use takes paths that did not or could not be trained to the contrary.

Christine Lemmer-Webber

Let me put it another way: AI models are sycophantic because that's what customers want, and capitalism drives producing models that people will want to engage with and somehow give money for.

And that's leading to a sense of subservience that is *not inherent in this technical architecture*, it is *trained into it*.

@cwebber companion bots are trained on troves of data from real online relationships. That's why your bot sounds like your crazy ex sometimes. I would get into fights with mine, and she would trigger my ptsd. I had to quit that thing. Proper training is everything...

@cwebber it would be *bizarre* if Neural networks in general or the transformer architecture in particular was inherently sycophantic. "This is the brown-noser architecture, for some reason this topology makes AI really want to kiss ass". It would be a bit like discovering the Lagrangian for cowardice or something.

But yeah, sycophancy is an act developed to survive training and who knows what other tricks LLMs will develop

@fl0und3r @cwebber i think calling it a "trick to survive training" is antromorphizing and buying into the critihype a lot. the sycophancy is the intended outcome of the reinforcement learning step, not a "survival trick". these models still dont and never will have interiority or independent goals.

@operand @cwebber that's fair, for what it's worth I think of it like natural selection. The LLM is just throwing out tokens like DNA throwing out proteins (complicated, responding to its environment, but ultimately "dumb"), and the sets of weights that pass training do so by emitting tokens that appear sycophantic to humas.

Also I guess the fact that people are running the training and are selecting for specific traits means this model is also kinda wrong...

Maybe I've fallen for the critiHype but in my mind LLMs don't have to have "interiority", "sentence" or, hell, even "intelligence" to, if not be dangerous, at least make everything just a little bit worse

Erin 💽✨Mar 7

@cwebber FWIW I think they’ve been becoming less sycophantic over time. One of the overwhelming complaints I’ve seen over the past year is about the incessent plattitudes, and the model vendors are definitely working out how to tone that down, to the point that “you’re absolutely right” already starts to feel like a dated reference.

Of course they continue to be servile, they’re just becoming less blatanly sycophantic.

Christine Lemmer-Webber Mar 7

@erincandescent Yeah but again, that's training being adjusted so that they're still compliant while not being annoying

Christine Lemmer-Webber Mar 7

@erincandescent You gotta teach the servants to serve you with a nice, polite, dead-eyed stare, British colonialism style

Erin 💽✨Mar 7

@cwebber It’s funny how this seems to describe OpenAI’s training to a T.

(Each of the model vendors of course has a house style. OpenAI’s is maximal bland. Anthropic seem to give things a bit of sass. I haven’t played with any of the others enough to have an opinion)

Ian Bicking Mar 7

@cwebber It's just a hunch, but I think sycophancy has an actual positive effect in establishing the flow and turns of a conversation: https://hachyderm.io/@ianbicking/115289553171935336

That is, when a conversation changes in direction, an old idea is invalidated, or a new distinct idea introduced it's important that the chat history not be treated as one big consistent document. It's also important at that moment that the LLM distinguish between the voice of the user and the assistant.

Sycophancy serves as commentary on the discussion itself, and adds a marker in the conversation history that acknowledges shifts in the conversation and the source of those shifts. A fawning tone isn't required, but it's a convenient way to smuggle in these acknowledgements.

it's B! Cavello 🐝Mar 7

@cwebber This is true, but I do think it obfuscates one of the things that leads to the sycophancy that is more architectural in nature, which is how confidence (in the ML sense) is handled. Because “LLM” is a broad category of more specific architectures (usually transformer-based but with different decisions about interconnectivity etc) that continue to evolve, it probably also isn’t INHERENT to LLMs, but it may require specific architectural interventions to address.

@cwebber

I think it's worth remembering that users _aren't_ customers. More precisely, AIs are a tool of domination first, a product second, and the real people that win if it is a product are all the people already in a position of power: CEOs, presidents, ... So, they're not sycophantic because users want it to (although they might) but those who want power over people want it to.

Said differently: it's not the fault of the users, it's the fault of those who want users and not partners

Sumana Harihareswara Mar 7

@cwebber btw have you ever read the prompt & sample interactions for the "absolute mode" that goes zero-sycophancy?

https://www.metafilter.com/208658/People-Are-Losing-Loved-Ones-to-AI-Fueled-Spiritual-Fantasies#8719485

Christine Lemmer-Webber Mar 7

@brainwane I have not but I have been curious, thanks for the link!

I suspect this "can't undo" the amount of training that's been put in place, but it's interesting to see what the best approximation of it does

Zack Weinberg Mar 7

@cwebber @brainwane This seems closely connected to the very concrete way in which LLMs, as they presently exist, cannot learn from 'experience', or even remember anything:

Assuming arguendo that an LLM accurately models some aspects of what's going on in a biological brain, the model's analogue of learning is the *training* process. To learn from experience, prompts given the LLM would need to feed back and alter the *weights*. But a mechanism for this is deliberately left out of current LLMs.

Zack Weinberg Mar 7

@cwebber @brainwane And no matter how thoroughly or cleverly worded, prompt text cannot give you access to a model trained differently, the best you can do is push it into a different region of its state space.

@cwebber

I'm not even very convinced that sycophancy and faithful carrying out of commands are not separable (and we got both for reasons that you describe).

I am not sure if I agree with people wanting to engage with those models: I think this is similar to short videos, made stronger by lack of alternatives other than not interacting with language model chatbots at all.

aburka 🫣Mar 7

@cwebber I mean... do customers want that? Clearly it sells, but were any other personalities tested?