Mastodawn

Simon Willison Feb 9, 2023

The biggest question for me about large language model interfaces - ChatGPT, the new Bing, Google's Bard - is this:

How long does it take for regular users (as opposed to experts, or people who just try them once or twice) to convince themselves that these tools frequently makes things up that aren't accurate?

And assuming they figure this out, how does knowing it affect the way they use these tools?

Show thread

Simon Willison

Someone must have done research on this, right? It feels pretty fundamental!

Show thread

Simon Willison Feb 9, 2023

One argument here is that people will blindly trust any chatbot that supports their previous biases

Is that cynicism justified?

What happens when the chatbot speaks against their biases? In particular, what if it both counters their biases AND does so in a way that is demonstrably factually incorrect?

We are already seeing furious complaints from some corners that ChatGPT has a liberal bias - how does that affect how those complainants trust and use these tools?

Show thread

Simon Willison Feb 9, 2023

Hindu nationalists are FURIOUS about ChatGPT right now: https://www.wired.com/story/chatgpt-has-been-sucked-into-indias-culture-wars/

How will that impact their trust of systems like this in the future?

ChatGPT Has Been Sucked Into India's Culture Wars

Hindu nationalists claim that the chatbot has insulted their deities, sparking an online uproar.

WIRED

Show thread

Dan Sugalski Feb 9, 2023

@simon This isn't really cynicism, I think it's more an optimistic view of people.

Show thread

Leigh Garland Feb 9, 2023

@simon No research, but after an afternoon of 'playing' with Chat GPT, I had worked out it's limitations.

My takeaway, and note of optimism, is that people will be able to 'smell' bot-generated text quite easily. Whether they'll care is another discussion.

Show thread

Adam Casto Feb 9, 2023

@simon To be fair though, they also thought a plain red cup had a liberal bias.

Show thread

Sean Bleier Feb 9, 2023

@simon I think we're going to see more ChatGPTs out there and my guess is that they are going to attract different people based on their biases. People select their echo chambers in social media and we've seen the feedback loop it has produced with respect to political extremism. I think we're about to see another feedback loop with ChatGPTs. That is, people seeking out models that confirm their biases, which then drives them to produce biased content to feed back into it, and repeat.

Show thread

Simon Willison Feb 9, 2023

@sebleier What will happen when a right-leaning chatbot gains popularity, but then people figure out ways to trick it into supporting left wing talking points and start sharing prompts and screenshots?

Show thread

Sean Bleier Feb 9, 2023

@simon – People are Bayesian by nature, so depending on how they prioritize truth vs. satisfying their biases, you'll see some people dock their favorite chat bot a few points if it spouts an opposing ideology. If it gets to a certain point, you'll see a phase transition and you may see people migrate to another platform. I see it as analogous to the recent migration of people moving from Fox News to OANN or Newsmax.

Show thread

Glyph Feb 9, 2023

@simon have you turned on any US political news in the last 8 years? I think that the idea that there is such a thing as a consensus view of “demonstrably factually incorrect” is a statement so bold as to be unsupportable

Show thread

Simon Willison Feb 9, 2023

@glyph My question remains: if a right-leaning person encounters replies from ChatGPT that directly counters their existing beliefs (and which they can fact check through other sources), do they stop believing that ChatGPT is an infallible source of information?

Even if their conclusion is "It's a conspiracy! The chatbot has been neutered!", does it still provide some level of protection for them in terms of helping them understand that these things are deeply fallible?

Show thread

Glyph Feb 9, 2023

@simon Their epistemic foundation is culturally authoritarian, not empirical, and I don't think they'll perceive ChatGPT itself as an agent with its own authority, more like an esoteric fountain of information to be incorporated into their (already incoherent) syncretic model of the world. So they'll poke at it until it reveals some "hidden truth" and they'll believe or not-believe various its various mumblings on a case-by-case basis.

Show thread

Glyph Feb 9, 2023

@simon like the entire concept of syncretism is such a wild ride. Someone like e.g. Jordan Peterson is already LLM-esque in his "intellectual" output: he will take words that are similar even like… phonetically… or refer to concepts with geometrically similar visualizations as "the same"; happily cherry-picking from scientific literature looking for confirmation of their biases

Show thread

Glyph Feb 9, 2023

@simon from an empirical epistemic viewpoint, you'd expect that if they're citing scientific studies, the locus of authority is in empirical observations and the process of peer review; but no, the authority comes from the bias-confirming authority of the filter (your Peterson or Shapiro or Crowder) telling you *which* studies are the right ones to trust, for some reason

Show thread

Glyph Feb 9, 2023

@simon so I think that ChatGPT will occupy the same spot in the hierarchy of authority as "science", which is to say that the various grifter/preachers will mine it for confirmation bias, discard everything it produces that they don't like, repeat everything it says that they do like as secretly true, and very few individual rank-and-file right-wingers will bother to interact with it directly

Show thread

Ari Koponen Feb 9, 2023

@simon Also, will people be less likely to realize this, if the language model caters to their own biases?

Show thread

William Gunn Feb 9, 2023

@simon not exactly what you're looking for, but this says a little about whether people can recognize generated content when they see it in the wild, and how helpful they find it: https://arxiv.org/abs/2301.07597

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at https://github.com/Hello-SimpleAI/chatgpt-comparison-detection.

arXiv.org

Show thread

Drew Breunig Feb 9, 2023

@simon The research from the last decade suggests people are terrible at evaluating information on topics they don’t know much about. The kind of thing you’d as a search engine about. This is a terrible use case, but thankfully I don’t think it will succeed.

Show thread

mike bayer

Feb 9, 2023

@simon the people who did the research like @timnitGebru were fired

Show thread

: j@fabrica:~/src;

Feb 9, 2023

@simon permaquote literally every time I go spelunking for journal articles.