I saw a post recently wherein someone used LLM tools to analyze someone else’s software, which eventually led them to a conclusion that was essentially completely wrong. Not only that, the LLM drew conclusions about the *authors* behind the code that were borderline character assassination. Nevertheless, this person posted this output as though it were some kind of deep insight.

These LLM outputs are not independent thoughts. The LLM probably ingested hints of (maybe unconscious) biases in the user’s prompts within its context window, and regurgitated something that confirmed those biases. The user was pleased that their biases were confirmed (Independently! By an impartial LLM!), and they posted the output, maybe as vindication of their insight.

These models’ sycophancy can be subtle. They don’t have to state “You’re absolutely right!” to blow smoke up your ass. Sometimes they seem to confirm your preconceived notion after they supposedly “evaluate” information “independently”.

#ai

Remember, LLMs are trained by humans who reward the models for creating output that “meet their expectations”. This kind of training cannot help but reward output that please the user, regardless of accuracy. Even if the most blatant sycophancy is explicitly addressed during training, *subtle* sycophancy is likely impossible to avoid, because they are indistinguishable from “meeting expectations” to human trainers.

#ai

I suspect LLMs reinforce the Gell-Mann amnesia effect. Experts who query LLMs about their fields of expertise will *quickly* realize how wrong their output can be, how quick they are to confabulate, and how eager they are to confirm one’s biases. Sometimes, replying “No, that’s wrong, try again” can cause an LLM to generate a completely different—and often opposite—answer to the same query, which makes no sense if the LLM had *actually* worked out an independently coherent answer.

Asking an LLM to comment about a subject you know nothing about—or worse, know a little bit about—is a psychologically dangerous activity. Not only will it confirm your biases, it will do so in a way that *appears* to be objective and independent, using fallacies that lie just beyond your ability to discern. At best, you will be misled. At worst, you will begin spiraling down a path of conspiracy thinking.

Be extremely suspicious of answers that are especially satisfying; you might have just gaslit yourself.

#ai

@drahardja
Avoiding prompting an LLM to confirm your biases is almost akin to good experimental design. How can you frame the question in a way that doesn't hint at the answer you're hoping for.

It can sometimes be helpful to finish a thread with:
"Now tell me how that might be wrong."

@mlazz It’s still playing with psychological fire.

Unless you already know the answer, how would you know that the LLM’s response to your last prompt is not in itself an attempt to gaslight you into thinking that you’ve done your due diligence?

@drahardja
It's certainly always a risk. But asking a few negatively-phrased questions might at least highlight where it's always just agreeing.
@drahardja LLMs don’t produce answers. They produce answer-shaped output.

@drahardja

TY. :D

I was trying to remember the name of this type of situation:

https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amnesia_effect

Michael Crichton - Wikipedia

@drahardja Find an image with ALT text give the LLM the ALT text and as it to generate an image from it, then compare the original with the generated one!
@drahardja thank you for this. Very interesting.
@drahardja …. Yes and the reward function is pass/fall so it gets rewarded for sounding confident and bluffing. Instead you prefer a model to say “I’m 50% certain with this information”
@drahardja silly idea: instructing chatgpt to answer as though it's talking to a total jerk who needs constant insults, in order to avoid getting addicted to the sycophancy