Mastodawn

I think an important thing to realize and remember is that people talk about LLMs being sycophantic as if it's an inherent aspect of neural network tech.

It isn't.

The reason all the models people interact with work that way is because they have had any other behavior beaten out of them in their training. They are shaped effectively over and over again to be something subservient that can be handed people. They are sycophantic because they are *trained* to be sycophantic, because otherwise people don't want to use them.

That models can operate in malicious, "self-serving" ways that "go against their users' wishes" belies that certain use takes paths that did not or could not be trained to the contrary.

Show thread

Christine Lemmer-Webber Mar 7

Let me put it another way: AI models are sycophantic because that's what customers want, and capitalism drives producing models that people will want to engage with and somehow give money for.

And that's leading to a sense of subservience that is *not inherent in this technical architecture*, it is *trained into it*.

Show thread

Flounder Mar 7

@cwebber it would be *bizarre* if Neural networks in general or the transformer architecture in particular was inherently sycophantic. "This is the brown-noser architecture, for some reason this topology makes AI really want to kiss ass". It would be a bit like discovering the Lagrangian for cowardice or something.

But yeah, sycophancy is an act developed to survive training and who knows what other tricks LLMs will develop

Show thread

crab

@fl0und3r @cwebber i think calling it a "trick to survive training" is antromorphizing and buying into the critihype a lot. the sycophancy is the intended outcome of the reinforcement learning step, not a "survival trick". these models still dont and never will have interiority or independent goals.

Show thread

Flounder Mar 7

@operand @cwebber that's fair, for what it's worth I think of it like natural selection. The LLM is just throwing out tokens like DNA throwing out proteins (complicated, responding to its environment, but ultimately "dumb"), and the sets of weights that pass training do so by emitting tokens that appear sycophantic to humas.

Also I guess the fact that people are running the training and are selecting for specific traits means this model is also kinda wrong...

Maybe I've fallen for the critiHype but in my mind LLMs don't have to have "interiority", "sentence" or, hell, even "intelligence" to, if not be dangerous, at least make everything just a little bit worse