I think an important thing to realize and remember is that people talk about LLMs being sycophantic as if it's an inherent aspect of neural network tech.

It isn't.

The reason all the models people interact with work that way is because they have had any other behavior beaten out of them in their training. They are shaped effectively over and over again to be something subservient that can be handed people. They are sycophantic because they are *trained* to be sycophantic, because otherwise people don't want to use them.

That models can operate in malicious, "self-serving" ways that "go against their users' wishes" belies that certain use takes paths that did not or could not be trained to the contrary.

Let me put it another way: AI models are sycophantic because that's what customers want, and capitalism drives producing models that people will want to engage with and somehow give money for.

And that's leading to a sense of subservience that is *not inherent in this technical architecture*, it is *trained into it*.

@cwebber btw have you ever read the prompt & sample interactions for the "absolute mode" that goes zero-sycophancy?

https://www.metafilter.com/208658/People-Are-Losing-Loved-Ones-to-AI-Fueled-Spiritual-Fantasies#8719485

@brainwane I have not but I have been curious, thanks for the link!

I suspect this "can't undo" the amount of training that's been put in place, but it's interesting to see what the best approximation of it does

@cwebber @brainwane This seems closely connected to the very concrete way in which LLMs, as they presently exist, cannot learn from 'experience', or even remember anything:

Assuming arguendo that an LLM accurately models some aspects of what's going on in a biological brain, the model's analogue of learning is the *training* process. To learn from experience, prompts given the LLM would need to feed back and alter the *weights*. But a mechanism for this is deliberately left out of current LLMs.

@cwebber @brainwane And no matter how thoroughly or cleverly worded, prompt text cannot give you access to a model trained differently, the best you can do is push it into a different region of its state space.