Mastodawn

https://dl.acm.org/doi/10.1145/3498366.3505816

Using a large language model as a search engine was a bad idea when it was proposed by a search company. It's still a bad idea now that it's being proposed by a social media company. Fortunately, Chirag Shah and I already wrote the paper laying out all the ways in which this is a bad idea.

#AL #ML #MathyMath #Bullshit #NLP #NLProc #AIhype

Situating Search | Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

ACM Conferences

https://www.technologyreview.com/2022/03/29/1048439/chatbots-replace-search-engine-terrible-idea/

In the popular press/general public-facing Q&A about our paper:

https://www.washington.edu/news/2022/03/14/qa-preserving-context-and-user-intent-in-the-future-of-web-search/

#AL #ML #MathyMath #Bullshit #NLP #NLProc #AIhype

Chatbots could one day replace search engines. Here’s why that’s a terrible idea.

Language models are mindless mimics that do not understand what they are saying—so why do we pretend they’re experts?

MIT Technology Review

And let's reflect for a moment on how they phrased their disclaimer, shall we? "Hallucinate" is a terrible word choice here, suggesting as it does that the language model has *experiences* and *perceives things*. (And on top of that, it's making light of a symptom of serious mental illness.)

Likewise "LLMs are often Confident". No, they're not. That would require subjective emotion.

I went digging in the paper to see if they cite #StochasticParrots or Bender & Koller 2020 or Shah & Bender 2022. That is, did they read about why this is misguided and just press ahead anyway? Apparently not.

https://aclanthology.org/2020.acl-main.485/

They do cite Blodgett et al 2020 (fabulous paper!)

But in the strangest possible way. Are they reflecting on the possible harms their technology might engender? No, of course not. They're striving for TRUTH! And thus worried about "bias".

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

ACL Anthology

Narrator voice: LMs have no access to "truth", or any kind of "information" beyond information about the distribution of word forms in their training data. And yet, here we are. Again. /fin (for now)

Leon Derczynski 🏔️✍🏻🌲☀️Nov 16, 2022

@emilymbender surely you're not suggesting that calling a dataset the "ground truth" could be problematic :o

Adam Collins Nov 17, 2022

@emilymbender There are a lot of SWEs and DS engs out there who truly believe that, with enough data, the models will magically know things and be flawless.

Leon Derczynski 🏔️✍🏻🌲☀️Nov 16, 2022

@emilymbender but wait
1. social media's variation of style is a more true representation of language
2. scientific style doesn't mean higher veracity, that's an error of ethos
3. one paper showing low transfer between two styles is indicative of ..nothing.. wrt. veracity
4. science is often wrong, this is intrinsic and on balance A Good Thing
5. truth isn't quite so objective or permanent
6. getting a real "science is the only form of scholarship" vibe here, bad assumption. a False one, even

Aaron Nov 21, 2022

@emilymbender I definitely get the vibe that they trained this model with all these aspirations of it being awesome and helpful and useful and safe because of their choice of data source, then saw that it met none of those goals and just started making excuses. Like they decided to double down because they were heavily invested in it and just wanted it to work *so bad*.

Aaron Nov 21, 2022

@emilymbender "We spent so much time and money on it! It *has* to work!"

Jed Brown Nov 16, 2022

@emilymbender I checked this first and was disappointed. Then I tried some queries expecting to get CoPilot-style plagiarism, but almost every query returned misleading, confidently wrong, and self-contradictory babble, and I came away thinking it's perhaps not so harmful because the most use it can aspire to is as a punchline for jokes.

Clive Thompson Nov 16, 2022

@emilymbender

omg I laughed so hard when I got to the "hallucinate" line

It's a terrible bit of anthropomorphization, but *damn* it's funny to see the CREATOR saying that

If I were helping them copyedit it I might have suggested the line "Language Models Can Pull Stuff Out Of Their Hats That Makes, Like, *Zero* Sense, So YMMV k thx bai"

William Gunn Nov 21, 2022

@emilymbender Perhaps I'm misunderstanding something, but this feels a bit like a strawman. AI generated results will have issues with "relevance, usefulness, and trustworthiness" but no one's saying they won't. None of these products are being presented as infallible oracles.

Some people, when the products are more well polished and no longer have the GIANT DISCLAIMERS they currently have, will over-trust, but answers from people also have these problems, right?

Prof. Emily M. Bender(she/her)Nov 21, 2022

@williamgunn How about reading the paper that I linked to (or bare minimum the popular press coverage of it in the next post) before arguing with me?

Prof. Emily M. Bender(she/her)Nov 21, 2022

@williamgunn I promise you, CHIIR, the peer reviewed venue where we published it, is not interested in publishing arguments against strawmen.

William Gunn Nov 21, 2022

@emilymbender Sorry, I thought you'd have noticed that I quoted from your paper in my response. I have more than an academic interest here. My company is using AI models to do these things & I'm genuinely interested in good criticism.

Maybe it was the ethics angle that didn't work for me. Who else is writing good criticisms?

Prof. Emily M. Bender(she/her)Nov 21, 2022

@williamgunn My consulting fee is $1200 per hour. And if the ethics angle doesn't work for you, then that's definitely a company I wouldn't work for.

William Gunn Nov 21, 2022

@emilymbender Sorry to bother you.

Jack Mott Dec 9, 2022

@emilymbender I've carefully considered the arguments in the paper, and while I agree it would be a bad idea for a public company to use a large language model for search at the moment, *I* still want to use one sometimes. And some of the shortcomings you point out seem relatively simple to mitigate (like including source material links etc)

Taylor Beauvais Nov 16, 2022

@emilymbender "Hallucinate" is a funny way to say doesn't actually work.

@taylorbeauvais I've got some words about that, too, later in the thread. (Still working on it.)

Kee Hinckley Nov 16, 2022

@emilymbender My first attempt to use it generated something that brilliantly mimicked a paper written by someone who had forgotten the assignment was due today and hadn’t read the reading material.

And as silly as that sounds, it highlights the core problem. It doesn’t know what it doesn’t know. So it will just plug garbage into the gaps that sounds real, but isn’t. And as a reader, you have no idea which parts are based on “facts” (aka unauthenticated or attributed stuff ingested somewhere unknown) and which are made up to sound right.

@nazgul Dude, please check out the papers (**that I wrote**) that I cite in my thread. I know. I don't need the "core problem" mansplained to me.

Kee Hinckley Nov 16, 2022

@emilymbender My sincere apologies! I haven’t had a chance to go any deeper on your links yet. I was just replying with my initial reaction based on my brief experience with it yesterday. I’ve bookmarked your thread for later reading.

@nazgul Thank you for apologizing. In the future please consider that the women you are replying to might just be speaking from their own expertise ... and then use that to guide when you try to "inform". You too can make the world a better place.

Kee Hinckley Nov 16, 2022

@emilymbender
That was a good reminder of how toxic my last job was. I was being encouraged to make decisions without enough info, for a group of people who had far more experience than I did. That ran very counter to my usual technique of asking lots of questions, finding the people with good ideas, and supporting them.

You just gave me a wake-up that I may have left there, but I’m still using that “I know what I’m talking about” style instead of couching questions and off-the-cuff theories as what they actually are. Ugh.

A gut punch that I needed. Thanks for taking the time to respond.

@nazgul That's got to be the most positive response I've ever gotten to calling out mansplaining! Thank you.

Dr. Dan Killam Nov 16, 2022

@emilymbender I looked up my place of work (Bay area nonprofit, ~80 people) and it was entirely incorrect, from the city, to headcount, year of founding, organization, etc. It was like every fact available on our about page was ad libbed in with an incorrect one lol

David Jayatillake Nov 16, 2022

@emilymbender @epsilon it was supposed to be the Librarian from Snow Crash but they must have given it trending posts from their platforms as "facts" and "knowledge" 😂

Conor O'Neill Nov 16, 2022

@emilymbender
I like the way that they describe weird results from an AI as 'hallucination'.

@tpuddle I don't. Please read the rest of my thread.

ginevra Nov 16, 2022

@emilymbender Is 'Hallucinate' a technical term? It feels ... I'm not sure what to think in this context

Anthony (he/him)Nov 17, 2022

@emilymbender I can also provide a service where I give the wrong answer to questions. That's super easy.

https://mastodon.social/@amyhoy/109355444166205985

Maarten Nov 17, 2022

@emilymbender Amy Hoy proposed

> we should call it “artificial mansplaining,” always confident, rarely correct

kaybeeque 🍁💪💪🍁Nov 17, 2022

@emilymbender
I'm somehow working "language models can hallucinate" into something! Anything!

Just have no idea of what yet.

This is a job for #UnderDweller (my subconscious). 😄

I haven't set up #HomeAssistant voice input at this point and have the hardware to do it.

Gustavo Franco Nov 17, 2022

@emilymbender @pluralistic this works just fine... to promote several directors all the way up to VPs.

Dark Sage Torunka

Nov 17, 2022

@emilymbender Oh boy I can't wait for *this* to backfeed into itself like every other language model eventually does

Someone's going to try writing Wikipedia articles with this, then they get scraped back into v2 of the AI, and the loop closes in on itself.

Bernard Monsun Nov 17, 2022

@emilymbender Confident but wrong.
A perfect entry on human species for encyclopedia. Nothing more, nothing less.
Okay, hallucinations maybe.

Mike Taylor 🦕Nov 17, 2022

@emilymbender This is just Dissociated Press on a bigger scale. See https://en.wikipedia.org/wiki/Dissociated_press

Dissociated press - Wikipedia