Super frustrated with all the cheerleading over chatbots for search, so here's a thread of presentations of my work with Chirag Shah on why this is a bad idea. Follow threaded replies for:

op-ed
media coverage
original paper
conference presentation

Please boost whichever (if any) speak to you.

Chatbots are not a good replacement for search engines

https://iai.tv/articles/all-knowing-machines-are-a-fantasy-auid-2334

All-knowing machines are a fantasy | Emily M. Bender and Chriag Shah

The idea of an all-knowing computer program comes from science fiction and should stay there. Despite the seductive fluency of ChatGPT and other language models, they remain unsuitable as sources of knowledge. We must fight against the instinct to trust a human-sounding machine, argue Emily M. Bender & Chirag Shah.

IAI TV - Changing how the world thinks
Chatbots could one day replace search engines. Here’s why that’s a terrible idea.

Language models are mindless mimics that do not understand what they are saying—so why do we pretend they’re experts?

MIT Technology Review

Chatbots-as-search is an idea based on optimizing for convenience. But convenience is often at odds with what we need to be doing as we access and assess information.

https://www.washington.edu/news/2022/03/14/qa-preserving-context-and-user-intent-in-the-future-of-web-search/

Q&A: Preserving context and user intent in the future of web search

In a new perspective paper, University of Washington professors Emily M. Bender and Chirag Shah respond to proposals that reimagine web search as an application for large language model-driven...

UW News

Chatbots/large language models for search was a bad idea when Google proposed it and is still a bad idea even when coming from Meta, OpenAI or You.com

https://dl.acm.org/doi/10.1145/3498366.3505816

Situating Search | Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

ACM Conferences

Language models/automated BS generators only have information about word distributions. If they happen to create sentences that make sense it's because we make sense of them. But dis-connected "information" inhibits the broader project of sense-making.

https://www.youtube.com/watch?v=VY1GHbU_FYs&list=PLn0nrSd4xjjY3E1qxXpWDoF7q-Q3d6g_A&index=17

Situating Search

YouTube
@emilymbender I am no linguist, but I have been toying around with chatGPT to get an impression of its capabilities.
Some replies impressed me and some made me facepalm repetitively. I remain very skeptical of their actual usefulness.
When you say that LLMs just know word distributions, what do you think of findings like this: Emergent Analogical Reasoning in Large Language Models, https://arxiv.org/abs/2212.09196?
Emergent Analogical Reasoning in Large Language Models

The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven's Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

arXiv.org
Really interesting thread for a layman like me trying to make sense of all the hype surrounding this subject, not least in my field.
@emilymbender
To add to the paper @arildse linked, the following paper made me raise an eyebrow:
https://arxiv.org/abs/2212.03827
Discovering Latent Knowledge in Language Models Without Supervision

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.

arXiv.org

@ar_lt I try not to spend too much time with preprints, i.e. work that has not been vetted by experts.

But: From the abstract, it sounds like they are trying to inject some external information (the meaning of negation).

Also: "latent knowledge" is an unfortunate overstatement.