This framing is so gross. To see (human!) generated (ahem: English) text to be a "vital resource" you have to be deeply committed to the project of building AI models and in this particular way.

Link to original tweet:
https://twitter.com/emollick/status/1605756428941246466

Link to paper:
https://arxiv.org/pdf/2211.04325.pdf

#NLP #ethNLP #sustainability

Ethan Mollick on Twitter

“We are running out of a vital resource: words! There are “only” 5 to 10 trillion high-quality words (papers, books, code) on the internet. Our AI models will have used all of that for training by 2026. Low-quality data (tweets, fanfic) will last to 2040. https://t.co/hm1EaJ6Enu”

Twitter

@emilymbender There is a demand for low-background steel, steel produced before the nuclear tests mid century, for use in Geiger counters. They produce it from scavenging ships sunk during world war one, as it's the only way they can be sure there is no radiation.

The same is going to happen for internet data, only archives pre-2022 will be usable for sociology research and the like as the rest will be contaminated by AI nonsense. Absolute travesty.

Low-background steel - Wikipedia