Mastodawn

OpenAI says its new model GPT-2 is too dangerous to release (2019)

https://slate.com/technology/2019/02/openai-gpt2-text-generating-algorithm-ai-dangerous.html

When Is Technology Too Dangerous to Release to the Public?

If recent history is any indication, trying to suppress or control the proliferation of A.I. tools may be a losing battle.

Slate

Show thread

cinkhangin 1d ago

I think they are right unintentionally. The growing amount of low-quality content everywhere could become a real problem.

Show thread

ajsnigrutin 1d ago

Now imagine all that low quality AI slop is being posted online and a new generation of AI will "learn" from it, output it's own version of AI slop, that will eventually end up online again for a new generation of AI to "learn" from.

Something, something, idiocracy comes to mind.

Show thread

bitwize

This leads to a well-documented phenomenon known as model collapse. You know how if you blur and sharpen an image repeatedly you eventually end up with just a rectangle of creepy, wormy spaghetti lines? You lose information on each blur, and then ask it to reconstitute the image with less information on each sharpen, until there's nothing recognizable left.

Training a model is like the blur and generating from that model is like the sharpen. Repeat enough times and enough information is lost that you're just left with "wormy spaghetti lines"—in an LLM's case, meaningless gibberish that actually pretty closely resembles the glitchy stuff said by the cores that fall off GLaDOS in Portal. I dunno, you read the paper and be the judge:

https://www.nature.com/articles/s41586-024-07566-y

To jump to the last output sample, C-f Gen 9

Of course you may be talking about the human aspect of this. Gods willing, we'll realize that our LLMs are spewing gibberish and think twice about putting them in all the things, all the time. But the scenario I fear isn't Idiocracy—it's worse: a community of humans who treat the gibberish as sacred writ, Zardoz style.

AI models collapse when trained on recursively generated data - Nature

Analysis shows that indiscriminately training generative artificial intelligence on real and generated content, usually done by scraping data from the Internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.

Nature