Mastodawn

The biggest question for me about large language model interfaces - ChatGPT, the new Bing, Google's Bard - is this:

How long does it take for regular users (as opposed to experts, or people who just try them once or twice) to convince themselves that these tools frequently makes things up that aren't accurate?

And assuming they figure this out, how does knowing it affect the way they use these tools?

Show thread

Simon Willison Feb 9, 2023

Someone must have done research on this, right? It feels pretty fundamental!

Show thread

Simon Willison Feb 9, 2023

One argument here is that people will blindly trust any chatbot that supports their previous biases

Is that cynicism justified?

What happens when the chatbot speaks against their biases? In particular, what if it both counters their biases AND does so in a way that is demonstrably factually incorrect?

We are already seeing furious complaints from some corners that ChatGPT has a liberal bias - how does that affect how those complainants trust and use these tools?

Show thread

Simon Willison Feb 9, 2023

Hindu nationalists are FURIOUS about ChatGPT right now: https://www.wired.com/story/chatgpt-has-been-sucked-into-indias-culture-wars/

How will that impact their trust of systems like this in the future?

ChatGPT Has Been Sucked Into India's Culture Wars

Hindu nationalists claim that the chatbot has insulted their deities, sparking an online uproar.

WIRED

Show thread

Dan Sugalski Feb 9, 2023

@simon This isn't really cynicism, I think it's more an optimistic view of people.

Show thread

Leigh Garland Feb 9, 2023

@simon No research, but after an afternoon of 'playing' with Chat GPT, I had worked out it's limitations.

My takeaway, and note of optimism, is that people will be able to 'smell' bot-generated text quite easily. Whether they'll care is another discussion.

Show thread

Adam Casto Feb 9, 2023

@simon To be fair though, they also thought a plain red cup had a liberal bias.

Show thread

Sean Bleier Feb 9, 2023

@simon I think we're going to see more ChatGPTs out there and my guess is that they are going to attract different people based on their biases. People select their echo chambers in social media and we've seen the feedback loop it has produced with respect to political extremism. I think we're about to see another feedback loop with ChatGPTs. That is, people seeking out models that confirm their biases, which then drives them to produce biased content to feed back into it, and repeat.

Show thread

Simon Willison Feb 9, 2023

@sebleier What will happen when a right-leaning chatbot gains popularity, but then people figure out ways to trick it into supporting left wing talking points and start sharing prompts and screenshots?

Show thread

Sean Bleier Feb 9, 2023

@simon – People are Bayesian by nature, so depending on how they prioritize truth vs. satisfying their biases, you'll see some people dock their favorite chat bot a few points if it spouts an opposing ideology. If it gets to a certain point, you'll see a phase transition and you may see people migrate to another platform. I see it as analogous to the recent migration of people moving from Fox News to OANN or Newsmax.

Show thread

Glyph Feb 9, 2023

@simon have you turned on any US political news in the last 8 years? I think that the idea that there is such a thing as a consensus view of “demonstrably factually incorrect” is a statement so bold as to be unsupportable

Show thread

Simon Willison Feb 9, 2023

@glyph My question remains: if a right-leaning person encounters replies from ChatGPT that directly counters their existing beliefs (and which they can fact check through other sources), do they stop believing that ChatGPT is an infallible source of information?

Even if their conclusion is "It's a conspiracy! The chatbot has been neutered!", does it still provide some level of protection for them in terms of helping them understand that these things are deeply fallible?

Show thread

Glyph Feb 9, 2023

@simon Their epistemic foundation is culturally authoritarian, not empirical, and I don't think they'll perceive ChatGPT itself as an agent with its own authority, more like an esoteric fountain of information to be incorporated into their (already incoherent) syncretic model of the world. So they'll poke at it until it reveals some "hidden truth" and they'll believe or not-believe various its various mumblings on a case-by-case basis.

Show thread

Glyph Feb 9, 2023

@simon like the entire concept of syncretism is such a wild ride. Someone like e.g. Jordan Peterson is already LLM-esque in his "intellectual" output: he will take words that are similar even like… phonetically… or refer to concepts with geometrically similar visualizations as "the same"; happily cherry-picking from scientific literature looking for confirmation of their biases

Show thread

Glyph Feb 9, 2023

@simon from an empirical epistemic viewpoint, you'd expect that if they're citing scientific studies, the locus of authority is in empirical observations and the process of peer review; but no, the authority comes from the bias-confirming authority of the filter (your Peterson or Shapiro or Crowder) telling you *which* studies are the right ones to trust, for some reason

Show thread

Glyph Feb 9, 2023

@simon so I think that ChatGPT will occupy the same spot in the hierarchy of authority as "science", which is to say that the various grifter/preachers will mine it for confirmation bias, discard everything it produces that they don't like, repeat everything it says that they do like as secretly true, and very few individual rank-and-file right-wingers will bother to interact with it directly

Show thread

Ari Koponen Feb 9, 2023

@simon Also, will people be less likely to realize this, if the language model caters to their own biases?

Show thread

William Gunn Feb 9, 2023

@simon not exactly what you're looking for, but this says a little about whether people can recognize generated content when they see it in the wild, and how helpful they find it: https://arxiv.org/abs/2301.07597

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at https://github.com/Hello-SimpleAI/chatgpt-comparison-detection.

arXiv.org

Show thread

Drew Breunig Feb 9, 2023

@simon The research from the last decade suggests people are terrible at evaluating information on topics they don’t know much about. The kind of thing you’d as a search engine about. This is a terrible use case, but thankfully I don’t think it will succeed.

Show thread

mike bayer

Feb 9, 2023

@simon the people who did the research like @timnitGebru were fired

Show thread

: j@fabrica:~/src;

Feb 9, 2023

@simon permaquote literally every time I go spelunking for journal articles.

Show thread

Moebeus Feb 9, 2023

@simon We fall for flattery, vote for politicians who tell lies we want to hear, not that much of a stretch to think we'll favor whatever chatbot serves us back our bias just the way we like it?

Show thread

mikesten Feb 9, 2023

@simon To be fair, it's taken _me_ a while to properly understand it, despite thinking I had a good handle on it. And - ridiculously - it was ChatGPT confidently reporting a completely made up tally of Scrabble scores that drove the point home. I expected it to get obscure stuff wrong but adding up a bunch of numbers?

Show thread

Simon Willison Feb 9, 2023

@mikesten yeah, the "wait a second, this thing is a COMPUTER and it can't even do MATH?" learning moment is a pretty powerful one!

Show thread

mikesten Feb 9, 2023

@simon On the bright side, it gave us both 200 instead of giving Syl 240 and me 220. So.. I sort of owe it a beer.

Show thread

John Mark

☑️Feb 9, 2023

@simon taking it a step further - how long will it take non-experts and general bullshitters to learn how to influence and spin the chatbots?

Show thread

Marshall Applewhite Law Feb 9, 2023

@simon Given the argument I had over the weekend with a "regular user" who absolutely refused to accept my suggestion that the code ChatGPT was "helping" him with had problems, I don't think there's much hope for the future.

Show thread

Dan Hon Feb 9, 2023

@simon people don't figure this out in the (very) large because evolution and fitness doesn't reward truth/accuracy.

Show thread

Noah Cook Feb 9, 2023

@simon I fully expect chatbots to turbocharge the existing hatred of experts. Instead of "ChatGPT told me something incorrect", most people will just say "hah, ChatGPT proves those experts are wrong."

And if everyone has access to ChatGPT and maybe 1/5 of the population even knows an SME?

Maybe every field will look like medicine does now (even pre-pandemic), where virtually every non-expert believes a half-dozen impossible things they heard once without context.

Show thread

Django Feb 9, 2023

@simon Are the models less accurate than if a person were generating answers?

Show thread

Simon Willison Feb 9, 2023

@django depends on the question!

The bigger problem here is that the models are faster (can generate answers in less than a second) and output everything with an extremely confident writing style

Show thread

Julian Elve Feb 9, 2023

@simon Thank you for giving me the quote of the week for my all-company roundup tomorrow! (not public)

Show thread

Simon Willison Feb 9, 2023

@julian I'd love to hear how that goes over!

Show thread

Robert Litchfield Feb 9, 2023

@simon I work in a School Division, and what some are starting to find is that there is currently a limit to how good these are and while the reach of information is very wide, the depth is not.

There are frequent mistakes and downright plagiarism occurring in some of the responses that the Machine Learning system is providing.

This technology will make things easier, but does not eliminate the need to validate the information.

Show thread

Simon Willison Feb 9, 2023

@rlitchfield are your students firguring that out? How does their usage of these tools change once they realize how inaccurate they can be?

Show thread

Robert Litchfield Feb 9, 2023

@simon Junior/High school students that are using these technologies to cheat are not the type to look to deeply as all they want is a shortcut. It is the teacher's that are seeing how weak some of the results are.

Show thread

Simon Willison Feb 9, 2023

@rlitchfield How does a students opinions of the technology change over time, in particular after the second or third time they've been caught using it because it gave them facts that were obviously untrue and were marked as such?

Show thread

Jeff Schiller Feb 9, 2023

@simon This is probably my biggest concern, since trust is transitive, what effects do these things have beyond their own usages? Maybe more interestingly, why do these things frequently make up things that aren't accurate?

Show thread

Simon Willison Feb 9, 2023

@codedread I have a good understanding of why they lie so much: all they're ever doing is predicting the next word in a sequence of words based on their training data

They have no concept of truth - they just know statistically which words are most likely to follow "The Kennedy assasination was a conspiracy by ..." - based on the TBs of scraper data that was used to build their models

The fact that they get anything right at all is pretty astonishing!

Show thread

Rob Russell Feb 9, 2023

@simon @codedread I really wish they would remove the personification and chat UI. It should not look the same as a box where I send messages to people and people reply to me. User expectations would be better matched to the tools with a better explanation of the prompt and response.

I really get a lot out of using GPT-3 through the playground interface (thanks to your intro, Simon) but I haven't been using the chat interfaces because it feels like the wrong tool.

Show thread

Simon Willison Feb 9, 2023

@Rob_Russell @codedread oh that's really interesting - I hadn't thought about how strongly the chat interface reinforces the science fiction "AI" aspect of it all

The playground interface never seemed to click for a lot of people