What will LLM owners do when they've destroyed or hollowed out all human sources of knowledge and places of education/research?

What happens when they have no more human blood to suck on to prevent model collapse?

More importantly, what will society do if it builds its structures around stealing knowledge from people until the people give up producing knowledge?

(And this is ignoring LLMs' impact on our climate due to their vast energy use.)

#AI #LLMs

@FediThing FWIW model collapse isn't real. At this point the models are being trained on ~90% AI generated data.

@JigenD

I don't know where you're reading that from, but model collapse is definitely real. (https://en.wikipedia.org/wiki/Model_collapse)

Why would LLM companies be putting so many resources into lobbying governments to let them train on ever-larger amounts of human data, even if it's against the will of the creators of that data?

Why are they paying platform holders for access to human data?

And how exactly does a statistical model of language even gather data on the real world without feeding on human knowledge? It's effectively just a spreadsheet of language and popular human responses, there is no actual intelligence or sensory system.

Model collapse - Wikipedia

@FediThing What I stated isn't something I've read, but it's what I've learned from from working in the field. Model collapse from ai-trained data has only ever been shown in theoretical scenarios and doesn't happen in real-world scenarios.

I'm saying you can't count on modal collapse to alter ai training in any way.

That's apart from the original use of human data etc. Models now are often trained upon the backs of other models that were trained on human data.

@JigenD How the hell does a language model gather data about the real physical world without human input?

A language model is not some kind of magic being, it's just a spreadsheet of popular responses by humans and it always will be.

(I also notice you're not answering why LLM companies are putting so many resources into stealing human-made content.)

Maybe one day in the future there will be actual AI, but language models are not it.