Mastodawn

Cory Doctorow Apr 1, 2024

Here's a fun AI story: a security researcher noticed that large companies' AI-authored source-code repeatedly referenced a nonexistent library (an AI "hallucination"), so he created a (defanged) malicious library with that name and uploaded it, and thousands of developers automatically downloaded and incorporated it as they compiled the code:

https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/

AI hallucinates software packages and devs download them – even if potentially poisoned with malware

Simply look out for libraries imagined by ML and make them real, with actual malicious code. No wait, don't do that

The Register

Show thread

Cory Doctorow Apr 1, 2024

If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2024/04/01/human-in-the-loop/#monkey-in-the-middle

Pluralistic: Humans are not perfectly vigilant (01 Apr 2024) – Pluralistic: Daily links from Cory Doctorow

Show thread

Cory Doctorow Apr 1, 2024

These "hallucinations" are a stubbornly persistent feature of large language models, because these models only give the illusion of understanding; in reality, they are just sophisticated forms of autocomplete, drawing on huge databases to make shrewd (but reliably fallible) guesses about which word comes next:

https://dl.acm.org/doi/10.1145/3442188.3445922

On the Dangers of Stochastic Parrots | Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

ACM Conferences

Show thread

Cory Doctorow Apr 1, 2024

Guessing the next word without understanding the meaning of the resulting sentence makes unsupervised LLMs unsuitable for high-stakes tasks. The whole AI bubble is based on convincing investors that one or more of the following is true:

I. There are low-stakes, high-value tasks that will recoup the massive costs of AI training and operation;

II. There are high-stakes, high-value tasks that can be made cheaper by adding an AI to a human operator;

Show thread

Cory Doctorow Apr 1, 2024

III. Adding more training data to an AI will make it stop hallucinating, so that it can take over high-stakes, high-value tasks without a "human in the loop."

Show thread

Cory Doctorow Apr 1, 2024

These are dubious propositions. There's a universe of low-stakes, low-value tasks - political disinformation, spam, fraud, academic cheating, nonconsensual porn, dialog for video-game NPCs - but none of them seem likely to generate enough revenue for AI companies to justify the billions spent on models, nor the trillions in valuation attributed to AI companies:

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

Cory Doctorow: What Kind of Bubble is AI?

Of course AI is a bubble. It has all the hallmarks of a classic tech bubble. Pick up a rental car at SFO and drive in either direction on the 101 – north to San Francisco, south to Palo Alto – and …

Locus Online

Show thread

Cory Doctorow Apr 1, 2024

The proposition that increasing training data will decrease hallucinations is hotly contested among AI practitioners. I confess that I don't know enough about AI to evaluate opposing sides' claims, but even if you stipulate that adding lots of human-generated training data will make the software a better guesser, there's a serious problem.

Show thread

Cory Doctorow Apr 1, 2024

All those low-value, low-stakes applications are flooding the internet with botshit. After all, the one thing AI is unarguably *very* good at is producing bullshit at scale. As the web becomes an anaerobic lagoon for botshit, the quantum of human-generated "content" in any internet core sample is dwindling to homeopathic levels:

https://pluralistic.net/2024/03/14/inhuman-centipede/#enshittibottification

Pluralistic: The Coprophagic AI crisis (14 Mar 2024) – Pluralistic: Daily links from Cory Doctorow

Show thread

Lea

@pluralistic
>"As the web becomes an anaerobic lagoon for botshit, the quantum of human-generated "content" in any internet core sample is dwindling to homeopathic levels:"

This sentence is perfect.