A fresh problem with #AI is what might be called Artificial Gullibility.

According to a BlueSky poster, an academic who was ruled guilty of plagiarism has waged an extensive astroturfing campaign to rewrite the record. The goal was probably to game conventional search engines, but the texts have now been ingested by Google's AI. Google's "AI Overview” presents her (apparently false) version of events, backing it with the supposed authority of Google and “AI”.

1/

https://bsky.app/profile/laurenginsberg.bsky.social/post/3mhnxv2swok2g

Lauren Donovan Ginsberg (@laurenginsberg.bsky.social)

The return of ReceptioGate to the news is a useful moment to think about the role AI is having in creating truth for a lot of internet users. I posted this update - the clear plagiarism verdict against Rossi - on another platform… /1 [contains quote post or other embedded content]

Bluesky Social

Whatever the hypesters may tell you, LLMs do NOT reason. Given two conflicting versions of a story, they’ll go for the one that is repeated more often. The sequence of tokens representing a false narrative is – if the astroturfers have done their job right – statistically more probable than the sequence representing a factual account, so it's the false narrative that will get coded into the model and trotted out on demand.

2/

Imagine the opportunities for people pushing pseudoscience like Creationism or vaccine denial, or political propaganda, or corporate FUD.

In some ways, it's an extension of conventional SEO, which has always aimed to "put your story first", but now the untruths are delivered with the authority of "AI" (argumentum ab roboto), not just on search results pages, but in any other context where a naive user interacts with an LLM, e.g. with a chatbot.

3/

Model training often weights certain sources as more authoritative than others, so volume isn't the only thing that counts, and that weighting is reflected in the model. But what happens when "authoritative” sources are themselves biased?

4/

For instance, US government websites have presumably long been regarded as reliable, and given additional weight. That's emphatically no longer the case, when government sites are publishing propaganda, promoting pseudoscience, & suppressing or rewriting history.

Our deference to the presumed authority and impartiality of government communiques or 'serious’ news media is itself a problem, of course, but it's one that is multiplied a hundredfold by LLM regurgitation.

5/

LLMs are essentially gullible. And many people, even otherwise smart people, are gullible enough to believe that "AI" distillations of facts are trustworthy. It's a problem of gullibility compounded. But there's also an entire industry that's devoted to trying to convince us NOT to be skeptical of AI, not to see it for what it is -- an often-naive statistical model that can and will increasingly be gamed by bad actors.

6/

I once described the US as a complex distributed system with an attack surface of 300 million people. Gullible LLMs are a new vector for attacking that system, one that targets the weakest links in the chain, the people who don't know enough not to distrust those handy-dandy “AI Overview" boxes in their favorite search engine.

7/

It's also the case that the more untrustworthy LLM output becomes, the harder the people who have invested hundreds of billions in the tech will try to convince us that we must Trust the Superintelligent Machine That Knows Everything, and, indeed, to cut us off from competing knowledge sources. So we have that to look forward to.

Anyway, TL;DR: artificial gullibility is a problem that's only going to get worse, so brace yourselves.

/END

@angusm Time to go get an "internet in a box" ... box.

(Yeah, couldn't figure out a good way to end that.)

https://internet-in-a-box.org/

Internet-in-a-Box - Mandela's Library of Alexandria

Internet-in-a-Box is a tiny, powerful 'Digital Library of Alexandria' that can be set up by any school, medical clinic or community worldwide.

@angusm What I hear you saying is... garbage in, garbage out. No amount of rehashing or reprocessing will overcome this limitation.

At best, LLMs can average out their source material, and if most of it is garbage, then, well, the results are predictable.

Great thread, BTW!

@angusm

.... I have put up with human liars for long enough to know this entire argument is Special Pleading.

@angusm A professional programmer told me that there is no learning with Ai when I referred to training and target sets. After much more research I would go further; the use of statistics and probabilities has created a culture that favours mimicry and reinforcing bias. There is no moral decision making taking place, only a fascist rational to increase shareholder profits. That is why those like Thiel who depend on Ai projects like Titan having inflated share values are so dangerous. Despite that there is no reason I can think of why Algorithmic Hueristics cannot be designed to make morally rational decisions. Ask what does it mean to be intelligent? #Ai #ImmoralAi #MoralAi
@angusm Many people seem to have forgotten the meaning of the word "model" and I think that's where they go wrong.
@angusm Does this mean that the training is based on stats? So can an AI be trained on a training set with only one example of each target case?