Mastodawn

Carl T. Bergstrom Feb 13, 2023

Meta. OpenAI. Google.

Your AI chatbot is not *hallucinating*.

It's bullshitting.

It's bullshitting, because that's what you designed it to do. You designed it to generate seemingly authoritative text "with a blatant disregard for truth and logical coherence," i.e., to bullshit.

Show thread

Ryan Moulton Feb 13, 2023

@ct_bergstrom Disagree. They're designed to mimic what a human would write. If they end up bullshitting it's because the models aren't good enough, not because that's what they're designed to do.

Show thread

Carl T. Bergstrom Feb 13, 2023

@moultano Humans have an underlying knowledge model. They have beliefs about the world, and choose whether to represent those beliefs accurately or inaccurately using language.

LLMs do not have an underlying knowledge model, they don't have a concept of what is true or false in the world. They just string together words they don't "understand" in ways that are likely to seem credible.

It's not a matter of making better LLMs; it'll take a fundamentally different type of model.

Show thread

Ryan Moulton Feb 13, 2023

@ct_bergstrom LLMs represent whether they "believe something to be true" in a way that you can extract unsupervised. Not disagreeing that their world model isn't good enough to be used without auxiliary retrieval, but there's some evidence they have one. https://arxiv.org/abs/2212.03827

Discovering Latent Knowledge in Language Models Without Supervision

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.

arXiv.org

Show thread

Joe ❌👑Feb 14, 2023

@moultano @ct_bergstrom They consume language and then produce language. Their "beliefs" can be about the structure of the English language (when generating text in English), like that adjectives that describe color always go after adjectives that describe size: "the little red hen", not "the red little hen". But they don't have a model of the external world.

Show thread

Carl T. Bergstrom Feb 14, 2023

@not2b @moultano Precisely. Their "beliefs" have no anchor point outside the world of text.

Show thread

Ryan Moulton

@ct_bergstrom @not2b Huge fractions of what I know have no anchor point outside of text, like nearly all science and math.

Show thread

Ted Underwood Feb 14, 2023

@moultano @ct_bergstrom @not2b Not a whole lot of anchoring for history either. But there are already multimodal models, like Flamingo. If text really has to be grounded in sense experience we will presumably see that research path take the lead and produce better textual prediction.

Show thread

Ted Underwood Feb 14, 2023

@moultano @ct_bergstrom @not2b if multimodal training doesn’t make a model much better at predicting text, then at some point we’ll need to revise our priors and consider the possibility that a functional world model can largely be inferred from text

Show thread

Ryan Moulton Feb 14, 2023

@TedUnderwood @ct_bergstrom @not2b I think it's plausible that a multimodal model might eventually benefit, but the bandwidth advantage of text over video is just too great. You'd need enough video frames to have cause and effect, physics, plot, object permanence.

Show thread

Ted Underwood Feb 14, 2023

@moultano @ct_bergstrom @not2b I agree. Eventually sense experience will help, but language alone is providing more of a world model than lots of us would have expected. And if that’s true, then the people dismissing predict-the-next-word as inherently just bullshit generation are prob sneering too hastily.

Show thread

Jef Allbright Feb 14, 2023

@TedUnderwood @moultano @ct_bergstrom @not2b

People who understand the technology of large language models aren't dismissing it as "inherently just bullshit generation", but they are warning that its output smoothly mixes both fact and falsehood with no distinction or care.

And they warn that the quantity and impact of this #bullshit could likely surpass that of #politics, #consumerism, and other forms of rampant #disinformation for which we humans have demonstrated we are poorly prepared.

Show thread

Ben Carver Feb 14, 2023

@TedUnderwood @moultano @ct_bergstrom @not2b This connects with Angus Fletcher's article in Narrative (why computer AI will never do what we imagine it can), where he says narrative capacity derives from 500 million years of evolutionary practice at "flailing a flagellum or other primitive limb (...) In response to positive and negative reinforcement."

Show thread

Ted Underwood Feb 14, 2023

@Ben_Carver @moultano @ct_bergstrom @not2b I remember that essay. Surprised but grateful that people are lining up to make falsifiable predictions about this stuff.