Facebook (sorry: Meta) AI: Check out our "AI" that lets you access all of humanity's knowledge.

Also Facebook AI: Be careful though, it just makes shit up.

This isn't even "they were so busy asking if they could" --- but rather they failed to spend 5 minutes asking if they could.

#AL #ML #MathyMath #Bullshit #NLP #NLProc #AIhype

Using a large language model as a search engine was a bad idea when it was proposed by a search company. It's still a bad idea now that it's being proposed by a social media company. Fortunately, Chirag Shah and I already wrote the paper laying out all the ways in which this is a bad idea.

https://dl.acm.org/doi/10.1145/3498366.3505816

#AL #ML #MathyMath #Bullshit #NLP #NLProc #AIhype

Situating Search | Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

ACM Conferences
Chatbots could one day replace search engines. Here’s why that’s a terrible idea.

Language models are mindless mimics that do not understand what they are saying—so why do we pretend they’re experts?

MIT Technology Review

And let's reflect for a moment on how they phrased their disclaimer, shall we? "Hallucinate" is a terrible word choice here, suggesting as it does that the language model has *experiences* and *perceives things*. (And on top of that, it's making light of a symptom of serious mental illness.)

Likewise "LLMs are often Confident". No, they're not. That would require subjective emotion.

#AL #ML #MathyMath #AIhype #NLP #NLProc

I went digging in the paper to see if they cite #StochasticParrots or Bender & Koller 2020 or Shah & Bender 2022. That is, did they read about why this is misguided and just press ahead anyway? Apparently not.

#AL #ML #MathyMath #AIhype #NLP #NLProc

They do cite Blodgett et al 2020 (fabulous paper!)

https://aclanthology.org/2020.acl-main.485/

But in the strangest possible way. Are they reflecting on the possible harms their technology might engender? No, of course not. They're striving for TRUTH! And thus worried about "bias".

#AL #ML #MathyMath #AIhype #NLP #NLProc

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

ACL Anthology

Narrator voice: LMs have no access to "truth", or any kind of "information" beyond information about the distribution of word forms in their training data. And yet, here we are. Again. /fin (for now)

#AL #ML #MathyMath #AIhype #NLP #NLProc

@emilymbender surely you're not suggesting that calling a dataset the "ground truth" could be problematic :o
@emilymbender There are a lot of SWEs and DS engs out there who truly believe that, with enough data, the models will magically know things and be flawless.
@emilymbender but wait
1. social media's variation of style is a more true representation of language
2. scientific style doesn't mean higher veracity, that's an error of ethos
3. one paper showing low transfer between two styles is indicative of ..nothing.. wrt. veracity
4. science is often wrong, this is intrinsic and on balance A Good Thing
5. truth isn't quite so objective or permanent
6. getting a real "science is the only form of scholarship" vibe here, bad assumption. a False one, even
@emilymbender I definitely get the vibe that they trained this model with all these aspirations of it being awesome and helpful and useful and safe because of their choice of data source, then saw that it met none of those goals and just started making excuses. Like they decided to double down because they were heavily invested in it and just wanted it to work *so bad*.
@emilymbender "We spent so much time and money on it! It *has* to work!"
@emilymbender I checked this first and was disappointed. Then I tried some queries expecting to get CoPilot-style plagiarism, but almost every query returned misleading, confidently wrong, and self-contradictory babble, and I came away thinking it's perhaps not so harmful because the most use it can aspire to is as a punchline for jokes.

@emilymbender

omg I laughed so hard when I got to the "hallucinate" line

It's a terrible bit of anthropomorphization, but *damn* it's funny to see the CREATOR saying that

If I were helping them copyedit it I might have suggested the line "Language Models Can Pull Stuff Out Of Their Hats That Makes, Like, *Zero* Sense, So YMMV k thx bai"

@emilymbender Perhaps I'm misunderstanding something, but this feels a bit like a strawman. AI generated results will have issues with "relevance, usefulness, and trustworthiness" but no one's saying they won't. None of these products are being presented as infallible oracles.

Some people, when the products are more well polished and no longer have the GIANT DISCLAIMERS they currently have, will over-trust, but answers from people also have these problems, right?

@williamgunn How about reading the paper that I linked to (or bare minimum the popular press coverage of it in the next post) before arguing with me?
@williamgunn I promise you, CHIIR, the peer reviewed venue where we published it, is not interested in publishing arguments against strawmen.

@emilymbender Sorry, I thought you'd have noticed that I quoted from your paper in my response. I have more than an academic interest here. My company is using AI models to do these things & I'm genuinely interested in good criticism.

Maybe it was the ethics angle that didn't work for me. Who else is writing good criticisms?

@williamgunn My consulting fee is $1200 per hour. And if the ethics angle doesn't work for you, then that's definitely a company I wouldn't work for.
@emilymbender I've carefully considered the arguments in the paper, and while I agree it would be a bad idea for a public company to use a large language model for search at the moment, *I* still want to use one sometimes. And some of the shortcomings you point out seem relatively simple to mitigate (like including source material links etc)
@emilymbender "Hallucinate" is a funny way to say doesn't actually work.
@taylorbeauvais I've got some words about that, too, later in the thread. (Still working on it.)

@emilymbender My first attempt to use it generated something that brilliantly mimicked a paper written by someone who had forgotten the assignment was due today and hadn’t read the reading material.

And as silly as that sounds, it highlights the core problem. It doesn’t know what it doesn’t know. So it will just plug garbage into the gaps that sounds real, but isn’t. And as a reader, you have no idea which parts are based on “facts” (aka unauthenticated or attributed stuff ingested somewhere unknown) and which are made up to sound right.

@nazgul Dude, please check out the papers (**that I wrote**) that I cite in my thread. I know. I don't need the "core problem" mansplained to me.
@emilymbender My sincere apologies! I haven’t had a chance to go any deeper on your links yet. I was just replying with my initial reaction based on my brief experience with it yesterday. I’ve bookmarked your thread for later reading.
@nazgul Thank you for apologizing. In the future please consider that the women you are replying to might just be speaking from their own expertise ... and then use that to guide when you try to "inform". You too can make the world a better place.

@emilymbender
That was a good reminder of how toxic my last job was. I was being encouraged to make decisions without enough info, for a group of people who had far more experience than I did. That ran very counter to my usual technique of asking lots of questions, finding the people with good ideas, and supporting them.

You just gave me a wake-up that I may have left there, but I’m still using that “I know what I’m talking about” style instead of couching questions and off-the-cuff theories as what they actually are. Ugh.

A gut punch that I needed. Thanks for taking the time to respond.

@nazgul That's got to be the most positive response I've ever gotten to calling out mansplaining! Thank you.
@emilymbender I looked up my place of work (Bay area nonprofit, ~80 people) and it was entirely incorrect, from the city, to headcount, year of founding, organization, etc. It was like every fact available on our about page was ad libbed in with an incorrect one lol
@emilymbender @epsilon it was supposed to be the Librarian from Snow Crash but they must have given it trending posts from their platforms as "facts" and "knowledge" 😂
@emilymbender
I like the way that they describe weird results from an AI as 'hallucination'.
@tpuddle I don't. Please read the rest of my thread.
@emilymbender Is 'Hallucinate' a technical term? It feels ... I'm not sure what to think in this context
@emilymbender I can also provide a service where I give the wrong answer to questions. That's super easy.

@emilymbender Amy Hoy proposed

> we should call it “artificial mansplaining,” always confident, rarely correct

https://mastodon.social/@amyhoy/109355444166205985

@emilymbender
I'm somehow working "language models can hallucinate" into something! Anything!

Just have no idea of what yet.

This is a job for #UnderDweller (my subconscious). 😄

I haven't set up #HomeAssistant voice input at this point and have the hardware to do it.

@emilymbender @pluralistic this works just fine... to promote several directors all the way up to VPs.

@emilymbender Oh boy I can't wait for *this* to backfeed into itself like every other language model eventually does

Someone's going to try writing Wikipedia articles with this, then they get scraped back into v2 of the AI, and the loop closes in on itself.

@emilymbender Confident but wrong.
A perfect entry on human species for encyclopedia. Nothing more, nothing less.
Okay, hallucinations maybe.
@emilymbender This is just Dissociated Press on a bigger scale. See https://en.wikipedia.org/wiki/Dissociated_press
Dissociated press - Wikipedia

@emilymbender Is it just me or does this problem with AI large language model output remind anyone else of recent crypto and venture capital mega-schiesters like SBF’s FTX and Elizabeth Holmes’ Theranos? Are we moving so fast we’re making larger significant mistakes? Or is there really a knowledge vacuum out there where supposedly informed investors and researchers are blinded by over-trust in technology?