Mastodawn

Lauren Weinstein

Here's the bottom line on LLM AI systems. If you can't trust that their answers are accurate, they are essentially worthless. It doesn't matter if parts of their answers are correct and parts are incorrect -- the incorrect parts "contaminate" the entire response and render it useless. Actually, even worse than useless, because this becomes a perfect vehicle for spreading misinformation that is combined and given gravitas by accurate information -- a horrific and dangerous toxic brew.

Earthshine --> moved Dec 6, 2023

@lauren That kinda just sounds like the internet with extra steps. ...

Martin Vermeer (not Jesus)Dec 6, 2023

@lauren And as a well-formed narrative in well-formed sentences.

Seán Fenian Dec 6, 2023

@lauren Yup. If you don't know which parts of it you can trust, you can't trust ANY of it.

db0 Dec 6, 2023

@lauren The mistake is that you're only assigning value to LLMs based on how factual they can be. That is not their strength. LLM are great at two things: As a toy you play with for things like creative writing, role-play etc. And as an assistant for things like programming or writing, where you are already an expert on the matter and can repair any mistakes, but don't have to write all from scratch.

Lauren Weinstein Dec 6, 2023

@db0 I stand by my statement. Even in the more limited contexts you note, errors creeping in can have horrific results. Note the lawyer who got into big trouble using an LLM for court filings that were full of reasonable sounding fantasies and that the LLM assured him were accurate and had been checked! Or errors in code that sneak in because the person using the LLM has been lulled into thinking they are always correct. I don't worry much about experts. Experts should be able to take care of themselves. I worry about ordinary people being screwed by these half-baked systems being deployed long before they are ready. Your view I would describe as "elitist".

db0 Dec 6, 2023

@lauren The first example about lawyers would be exactly trying to use LLM for factual reasons. Naturally it will lead to comical results.

And yes, using LLMs for code, without knowing programming, is a risky proposition. You might get something usable, but it would be about as reliable as copy-pasting the first answer on stack overflow.

I don't understand why you disregard the creative use-case, which is massive atm. Or why you consider me saying that experts can use AI, is "elitist".

db0 Dec 6, 2023

@lauren I program as a hobby. I use GenAI to speed up my coding by an order of magnitude. Likewise expert digital artists can speed up their drawing by an order of magnitude by utilizing GenAI in their process. Much like they benefited from using digital art tools to make their art faster. There's nothing elitist about using a tool to do your work more efficient.

Lauren Weinstein Dec 6, 2023

@db0 I believe my statements here (and past writings on this subject, which should be easy to find) sufficiently detail my views on this topic so that I need not detail them again here.

I think AI has enormous promise, and LLMs can be very useful. I also feel that the companies pushing these out now in an insane arms race know full well that the vast majority of users don't understand the limitations, are not going to test or verify or check answers (even if they had the skills to do so) and are being horribly misled.

db0 Dec 6, 2023

@lauren On that I agree. The companies pushing for LLMs, especially on anything factual are being completely irresponsible. I do agree that they are useless for that purpose and this is an unfixable problem.

I merely wanted to point out that LLMs are not worthless when they're not factual. Their use-case is elsewhere.

Jamie Knight Dec 6, 2023

@db0 @lauren then why are they being promoted as sources of fact if they can’t be relied upon?
Oh yeah, I remember… money…

db0 Dec 6, 2023

@knightlie @lauren yep. Like always, Capitalism ruins everything

Lauren Weinstein Dec 6, 2023

@db0 I don't worry about the good stuff tech can do -- I worry about the problems. The good stuff can take care of itself. And to date, the good stuff is a drop in the bucket compared to the damage that can be done by the bad stuff, especially in that "factual" realm of concern. My use of the term elitist is a shorthand for a common view I've long found among techies (going all the way back to my early days at UCLA ARPANET at the dawn of the Internet), that just because the techies know how to use this tech properly, somehow ordinary busy nontechnical people will also, and if they don't, it's their fault, not the techies fault who designed the systems.

db0 Dec 6, 2023

@lauren in this case, it's the corpo techbros who are explicitly promoting this tech for misuse, and mostly because of capitalist incentives as they need to keep the tech bubbles going. There's plenty of us who are trying to use this tech as it's meant to be used

Magnus Ahltorp Dec 7, 2023

@db0 @lauren We’ve had programs for a long time that produce code for you. They’re called compilers.

If you need an LLM to write code for you, you’re not working in a language that is enough high-level for your task. The text you write should be the source code, whatever language you’re using, otherwise you’re just prohibiting people from editing your code.

And finding subtle errors that are introduced by an LLM is not possible if you weren’t able to write the code yourself in the first place.

db0 Dec 7, 2023

@ahltorp @lauren Don't be condescending.

Magnus Ahltorp Dec 7, 2023

@db0 @lauren What a great counter argument. Did I say anything that is not true?

And I don’t think I’m condescending, I’m just not reverent towards LLMs.

db0 Dec 7, 2023

@ahltorp I don't respond well to smuglords. Learn to interact respectfully next time.

Magnus Ahltorp Dec 7, 2023

@db0 My reply was not mainly directed to you, but to other people reading the thread and mistaking your post for something useful.

And, it’s not disrespectful to point out that someone is wrong. That’s what you tried to do in your post. And I tried to point out that you were wrong.

And please don’t use language like “Learn to interact respectfully next time”. That, if anything, is condescending.

db0 Dec 7, 2023

@ahltorp You barging in a discussion, and then using it as a platform to talk to "the audience" is the height of condescension. You clearly think I don't know what I'm talking about and that you're better than me, not even worth replying to directly, which is why you felt confident enough to start with such a smug declaration and then continue in the same spirit. Your whole post is disrespectful as fuck. You won't just gaslight your way out of this one.

Magnus Ahltorp Dec 7, 2023

@db0 “You clearly think I don't know what I'm talking about”

Yes. You clearly know very little about computer science, and you misunderstand how Mastodon works.

You were not having a private conversation with the original poster, and my replying to you is not any more “barging in” than what you were doing in the first place. My reply was the third in the tree. There wasn’t any “discussion” happening, just your reply.

db0 Dec 7, 2023

@ahltorp Right. So I am accurate in calling you a condescending smuglord. I don't know what you're whining about then.

There's no point in replying seriously to blowhards like you. Learn to behave.

modulux Dec 6, 2023

@lauren It doesn't though. When I ask an LLM to describe an image I otherwise wouldn't be able to see, for example the picture of a device and which lights are on or off, the result of the description is both useful, and opened to testing.

Lauren Weinstein Dec 6, 2023

@modulux Except most people do not "do testing." They are accepting responses as correct and not doing further verification. I am frankly appalled at how little so many techies seem to understand about the real world of real world users.

modulux Dec 6, 2023

@lauren Oh, that's a serious problem for sure. And I don't want to just say it's an issue of personal responsibility, because that doesn't work very well. I guess what I'm concerned about is losing capabilities that make my life significantly better because other people are not careful or miseuse the technology. There's a lot of push against AI/LLMs, some of it justified, but multimodal AI is the biggest thing to happen to accessibility in a very long time.

Nothings Monstered Dec 6, 2023

@lauren if a pocket calculator says "10x10={<¥™÷>" it's obvious that it's broken. If my calculator says "10x10=10,000" it's broken in a *much worse* way.

A wrong answer that can pass for right is much more destructive than one that's obviously faulty.

Rachel Greenham Dec 6, 2023

@lauren @cstross the old science fiction cliché of the AI that cannot lie... was usually played as a weakness that the hero could use to save the day. in fact, it was their saving grace, and probably the result of hard won regulation.

cranky hedgehog 🦔 🏳️‍🌈Dec 6, 2023

@lauren I keep trying to tell my students this. It’s like “no really don’t use it to do your homework”. It’s exhausting.

Joel Dec 6, 2023

@lauren To be fair, humans and the internet do the same and with RAG LLMs are able to at least start citing their sources.

ersatzmaus Dec 6, 2023

@lauren Securitised Subprime Search Engines.

Lisa Melton Dec 6, 2023

@lauren This. 💯

Ron Teitelbaum Dec 6, 2023

@lauren Agree. The danger is adding misinformation to available information! At what point does the misinformation become accepted information newspeak!

Wandering Star Dec 6, 2023

@lauren The curate's egg.

Sasha Dec 6, 2023

@lauren You can say the same for web search. Yet here we are.

Lauren Weinstein Dec 6, 2023

@Smrki No. The big difference is that traditional Search requires users to go to those sites to get actual information. This automatically exposes them to far more details, and the SERP puts a range of choices up front and impossible to ignore.

LLM responses present a "prepackaged" single response with a false air of authenticity.

The situations are entirely different.

Sasha Dec 6, 2023

@lauren I get the same false sense of authenticity when I do a web search. So much falsehood drawing attention. Yet there is some utility there. Same with these models. Lots of crap yet something exciting and useful to be found there. As always, buyers beware.

Stephan Schulz Dec 6, 2023

@lauren Well, everything you say also applies to human experts or even organisations.

benni Dec 6, 2023

@lauren that is not true. human answers are often incorrect, too. but seldom worthless. i use chatgpt sometime just for finding a word, i have forgotten or for finding the right search terms. the answers are often partly wrong but i can use them to find the right answer. it is the same as with all other media, the question is how to use them, not if at all.

Lauren Weinstein Dec 6, 2023

@benni Most people are not checking the answers. They assume they are true. This has already caused problems for lawyers, doctors, and vast numbers of ordinary people. Most people are busy and assume that these systems are giving correct responses, and the firms encourage this view despite their disclaimers. They are a disaster in their current incarnations because they are being rushed out in crude form and the firms know it, treating everybody like worthless guinea pigs.

benni Dec 6, 2023

@lauren most people believe what they see in TV. It is often incorrect, too.

apophis Dec 7, 2023

@lauren @benni that partial-ballpark-guide use case is what i was using google for a lot back around 2015-2019

i find modern-day LLM application doesn't add much to the utility of that - in fact it makes it a lot *harder* to use it that way because it takes it to the extreme and starts giving me results that have nothing to do with what i wanted which were the more commonly sought subject matter and more clichéd phrase

often i'm swamped in the specific thing i was trying to avoid getting

SpaceLifeForm Dec 6, 2023

It is not good when bots talk to each other.

Jason Player Dec 6, 2023

@lauren This reminds me of an argument I had with someone regarding accessibility. The argument was 85% selectable text was 'good enough'. Well I found there perfect document for my point. The PDF had everything on the page selectable except for one bolded word in a sentence... "not".

carbon offsets are BS ☕️🥬Dec 6, 2023

Na I’d rather have a 90% functioning script that I have to do a little correction on than write 100% of it myself.

Most of the time, the whole script it spits out is correct on yhr first try though.

Lauren Weinstein Dec 6, 2023

@plasma4045 And if you weren't skilled enough to know that there were 10% errors, you of course would still happily use it. Uh huh.

carbon offsets are BS ☕️🥬Dec 6, 2023

I mean, the IDE usually tells you where the issue is. Would still save time if you don’t know the language.

In fact, I know it does because I’ve used it for languages I don’t write.

But maybe this is a different use case than you were describing in the OP.

Lauren Weinstein Dec 6, 2023

@plasma4045 Yes. Very different.

Joe ❌👑Dec 6, 2023

@lauren It depends on the application. Suppose, for a given application, that the answer I get is 80% likely to be correct and 20% likely to be wrong, and let's further assume that, knowing this, I intend to test the answer before using it. In this case, such a tool isn't useless, far from it. Even if the probability of being correct is less than 80%, it could still help to produce the right answer faster. But yes, it mean you can't use the output of the thing without further testing.

Lauren Weinstein Dec 6, 2023

@not2b The key words there are "knowing this." And for most users, that's not the case.

Joe ❌👑Dec 6, 2023

@lauren The way to figure it out is by testing. If you don't have a way to test, yes, you're out of luck. For example, if I ask a colleague for suggestions on how to debug a difficult problem and get back some feasible ideas, they might be wrong, but I can try them out. The same could be the case if I ask something like Copilot how to use some API I'm not familiar with: I expect that there might be problems with the answer, but it's a starting point.

I agree that it's a very bad idea to blur search and LLM, but even search suffers because so many of the results are from content farms filled with low quality crud.

Lauren Weinstein Dec 6, 2023

@not2b Most people don't have time to test, don't know how to test, don't have the expertise to test, and don't see any reason to test. That's the short list.

Nini Dec 7, 2023

@lauren "But it'll get better!" I know, doesn't mean I trust it more knowing who is pulling the strings. If anything, that it'll get better only concerns me further.

apophis Dec 7, 2023

@lauren i'm reminded of when donald trump was president and i'd developed a policy of treating everything he said as unintelligible noise coming out of a defective machine in the background and the only statements about anything the adminstration is actually doing that can be relied on in any capacity must come from someone else

didn't think we'd be seeing that applied at scale so soon in a totally different context

PJ "Vote Green" Coffey Feb 16, 2024

Well put great example.