LLMs have no model of correctness, only typicality. So:

“How much does it matter if it’s wrong?”

It’s astonishing how frequently both providers and users of LLM-based services fail to ask this basic question — which I think has a fairly obvious answer in this case, one that the research bears out.

(Repliers, NB: Research that confirms the seemingly obvious is useful and important, and “I already knew that” is not information that anyone is interested in except you.)

1/ https://www.404media.co/chatbots-health-medical-advice-study/

Chatbots Make Terrible Doctors, New Study Finds

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”

404 Media

Despite the obviousness of the larger conclusion (“LLMs don’t give accurate medical advice”), this passage is…if not surprising, exactly, at least really really interesting.

2/

There’s a lesson here, perhaps, about the tangled relationship between what is •typical• and what is •correct•, and what it is that LLMs actually do:

When medical professionals ask medical questions in technical medical language, the answers they get are typically correct.

When non-professional ask medical questions in a perhaps medically ill-formed vernacular mode, the answers they get are typically wrong.

The LLM readily models both of these things. Despite having no notion of correctness in either case, correctness is more statistically typical in one than the other.

3/

@inthehands One of the factors in this mess is the heavily-boosted notion that LLMs contain facts or knowledge. Coincidentally, sort of, but not really. A safer mental model is to think of them as a fuzzy virtual machine of sorts, not unlike a vibe-y JVM but programmed in something dressed as plain language. Garbage-in-garbage-out. Often anything-in-garbage-out.