The biggest question for me about large language model interfaces - ChatGPT, the new Bing, Google's Bard - is this:

How long does it take for regular users (as opposed to experts, or people who just try them once or twice) to convince themselves that these tools frequently makes things up that aren't accurate?

And assuming they figure this out, how does knowing it affect the way they use these tools?

Someone must have done research on this, right? It feels pretty fundamental!

One argument here is that people will blindly trust any chatbot that supports their previous biases

Is that cynicism justified?

What happens when the chatbot speaks against their biases? In particular, what if it both counters their biases AND does so in a way that is demonstrably factually incorrect?

We are already seeing furious complaints from some corners that ChatGPT has a liberal bias - how does that affect how those complainants trust and use these tools?

Hindu nationalists are FURIOUS about ChatGPT right now: https://www.wired.com/story/chatgpt-has-been-sucked-into-indias-culture-wars/

How will that impact their trust of systems like this in the future?

ChatGPT Has Been Sucked Into India's Culture Wars

Hindu nationalists claim that the chatbot has insulted their deities, sparking an online uproar.

WIRED
@simon This isn't really cynicism, I think it's more an optimistic view of people.

@simon No research, but after an afternoon of 'playing' with Chat GPT, I had worked out it's limitations.

My takeaway, and note of optimism, is that people will be able to 'smell' bot-generated text quite easily. Whether they'll care is another discussion.

@simon To be fair though, they also thought a plain red cup had a liberal bias.
@simon I think we're going to see more ChatGPTs out there and my guess is that they are going to attract different people based on their biases. People select their echo chambers in social media and we've seen the feedback loop it has produced with respect to political extremism. I think we're about to see another feedback loop with ChatGPTs. That is, people seeking out models that confirm their biases, which then drives them to produce biased content to feed back into it, and repeat.
@sebleier What will happen when a right-leaning chatbot gains popularity, but then people figure out ways to trick it into supporting left wing talking points and start sharing prompts and screenshots?
@simon – People are Bayesian by nature, so depending on how they prioritize truth vs. satisfying their biases, you'll see some people dock their favorite chat bot a few points if it spouts an opposing ideology. If it gets to a certain point, you'll see a phase transition and you may see people migrate to another platform. I see it as analogous to the recent migration of people moving from Fox News to OANN or Newsmax.
@simon have you turned on any US political news in the last 8 years? I think that the idea that there is such a thing as a consensus view of “demonstrably factually incorrect” is a statement so bold as to be unsupportable

@glyph My question remains: if a right-leaning person encounters replies from ChatGPT that directly counters their existing beliefs (and which they can fact check through other sources), do they stop believing that ChatGPT is an infallible source of information?

Even if their conclusion is "It's a conspiracy! The chatbot has been neutered!", does it still provide some level of protection for them in terms of helping them understand that these things are deeply fallible?

@simon Their epistemic foundation is culturally authoritarian, not empirical, and I don't think they'll perceive ChatGPT itself as an agent with its own authority, more like an esoteric fountain of information to be incorporated into their (already incoherent) syncretic model of the world. So they'll poke at it until it reveals some "hidden truth" and they'll believe or not-believe various its various mumblings on a case-by-case basis.
@simon like the entire concept of syncretism is such a wild ride. Someone like e.g. Jordan Peterson is already LLM-esque in his "intellectual" output: he will take words that are similar even like… phonetically… or refer to concepts with geometrically similar visualizations as "the same"; happily cherry-picking from scientific literature looking for confirmation of their biases
@simon from an empirical epistemic viewpoint, you'd expect that if they're citing scientific studies, the locus of authority is in empirical observations and the process of peer review; but no, the authority comes from the bias-confirming authority of the filter (your Peterson or Shapiro or Crowder) telling you *which* studies are the right ones to trust, for some reason
@simon so I think that ChatGPT will occupy the same spot in the hierarchy of authority as "science", which is to say that the various grifter/preachers will mine it for confirmation bias, discard everything it produces that they don't like, repeat everything it says that they do like as secretly true, and very few individual rank-and-file right-wingers will bother to interact with it directly
@simon Also, will people be less likely to realize this, if the language model caters to their own biases?
@simon not exactly what you're looking for, but this says a little about whether people can recognize generated content when they see it in the wild, and how helpful they find it: https://arxiv.org/abs/2301.07597
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at https://github.com/Hello-SimpleAI/chatgpt-comparison-detection.

arXiv.org
@simon The research from the last decade suggests people are terrible at evaluating information on topics they don’t know much about. The kind of thing you’d as a search engine about. This is a terrible use case, but thankfully I don’t think it will succeed.
@simon the people who did the research like @timnitGebru were fired
@simon permaquote literally every time I go spelunking for journal articles.
@simon We fall for flattery, vote for politicians who tell lies we want to hear, not that much of a stretch to think we'll favor whatever chatbot serves us back our bias just the way we like it?
@simon To be fair, it's taken _me_ a while to properly understand it, despite thinking I had a good handle on it. And - ridiculously - it was ChatGPT confidently reporting a completely made up tally of Scrabble scores that drove the point home. I expected it to get obscure stuff wrong but adding up a bunch of numbers?
@mikesten yeah, the "wait a second, this thing is a COMPUTER and it can't even do MATH?" learning moment is a pretty powerful one!
@simon On the bright side, it gave us both 200 instead of giving Syl 240 and me 220. So.. I sort of owe it a beer.
@simon taking it a step further - how long will it take non-experts and general bullshitters to learn how to influence and spin the chatbots?
@simon Given the argument I had over the weekend with a "regular user" who absolutely refused to accept my suggestion that the code ChatGPT was "helping" him with had problems, I don't think there's much hope for the future.
@simon people don't figure this out in the (very) large because evolution and fitness doesn't reward truth/accuracy.

@simon I fully expect chatbots to turbocharge the existing hatred of experts. Instead of "ChatGPT told me something incorrect", most people will just say "hah, ChatGPT proves those experts are wrong."

And if everyone has access to ChatGPT and maybe 1/5 of the population even knows an SME?

Maybe every field will look like medicine does now (even pre-pandemic), where virtually every non-expert believes a half-dozen impossible things they heard once without context.

@simon Are the models less accurate than if a person were generating answers?

@django depends on the question!

The bigger problem here is that the models are faster (can generate answers in less than a second) and output everything with an extremely confident writing style

@simon Thank you for giving me the quote of the week for my all-company roundup tomorrow! (not public)
@julian I'd love to hear how that goes over!

@simon I work in a School Division, and what some are starting to find is that there is currently a limit to how good these are and while the reach of information is very wide, the depth is not.

There are frequent mistakes and downright plagiarism occurring in some of the responses that the Machine Learning system is providing.

This technology will make things easier, but does not eliminate the need to validate the information.

@rlitchfield are your students firguring that out? How does their usage of these tools change once they realize how inaccurate they can be?
@simon Junior/High school students that are using these technologies to cheat are not the type to look to deeply as all they want is a shortcut. It is the teacher's that are seeing how weak some of the results are.
@rlitchfield How does a students opinions of the technology change over time, in particular after the second or third time they've been caught using it because it gave them facts that were obviously untrue and were marked as such?
@simon This is probably my biggest concern, since trust is transitive, what effects do these things have beyond their own usages? Maybe more interestingly, why do these things frequently make up things that aren't accurate?

@codedread I have a good understanding of why they lie so much: all they're ever doing is predicting the next word in a sequence of words based on their training data

They have no concept of truth - they just know statistically which words are most likely to follow "The Kennedy assasination was a conspiracy by ..." - based on the TBs of scraper data that was used to build their models

The fact that they get anything right at all is pretty astonishing!

@simon @codedread I really wish they would remove the personification and chat UI. It should not look the same as a box where I send messages to people and people reply to me. User expectations would be better matched to the tools with a better explanation of the prompt and response.

I really get a lot out of using GPT-3 through the playground interface (thanks to your intro, Simon) but I haven't been using the chat interfaces because it feels like the wrong tool.

@Rob_Russell @codedread oh that's really interesting - I hadn't thought about how strongly the chat interface reinforces the science fiction "AI" aspect of it all

The playground interface never seemed to click for a lot of people