What will LLM owners do when they've destroyed or hollowed out all human sources of knowledge and places of education/research?

What happens when they have no more human blood to suck on to prevent model collapse?

More importantly, what will society do if it builds its structures around stealing knowledge from people until the people give up producing knowledge?

(And this is ignoring LLMs' impact on our climate due to their vast energy use.)

#AI #LLMs

@FediThing
Do they care what happens beyond this quarter
Microsoft CTO Kevin Scott on how AI can save the web, not destroy it

One of Microsoftโ€™s top AI leaders on the benefits AI can bring to web search.

The Verge

@yoasif

๐Ÿคฎ

This would be the same Microsoft that is partnering with Elon Musk:

https://cyberplace.social/@GossiTheDog/114539912259712139

Kevin Beaumont (@GossiTheDog@cyberplace.social)

Attached: 1 image Microsoft SLT should be ashamed of themselves for this one. Microsoft knowingly platforms hate.

Cyberplace
@FediThing Was Satya greeted with a "Roman salute"?

@yoasif

I guess Microsoft want to be the new IBM (https://en.wikipedia.org/wiki/IBM_and_the_Holocaust).

IBM and the Holocaust - Wikipedia

@FediThing FWIW model collapse isn't real. At this point the models are being trained on ~90% AI generated data.

@JigenD

I don't know where you're reading that from, but model collapse is definitely real. (https://en.wikipedia.org/wiki/Model_collapse)

Why would LLM companies be putting so many resources into lobbying governments to let them train on ever-larger amounts of human data, even if it's against the will of the creators of that data?

Why are they paying platform holders for access to human data?

And how exactly does a statistical model of language even gather data on the real world without feeding on human knowledge? It's effectively just a spreadsheet of language and popular human responses, there is no actual intelligence or sensory system.

Model collapse - Wikipedia

@FediThing What I stated isn't something I've read, but it's what I've learned from from working in the field. Model collapse from ai-trained data has only ever been shown in theoretical scenarios and doesn't happen in real-world scenarios.

I'm saying you can't count on modal collapse to alter ai training in any way.

That's apart from the original use of human data etc. Models now are often trained upon the backs of other models that were trained on human data.

@JigenD How the hell does a language model gather data about the real physical world without human input?

A language model is not some kind of magic being, it's just a spreadsheet of popular responses by humans and it always will be.

(I also notice you're not answering why LLM companies are putting so many resources into stealing human-made content.)

Maybe one day in the future there will be actual AI, but language models are not it.

@FediThing

People will not just be unwilling to produce knowledge because it will be stolen, they won't even be able to produce knowledge.

If their qualifications are based on gettingโ€œAIโ€ to write their answers to teachers' questions, they will not have learned how how to think.

If by โ€œ#LLM ownersโ€ you mean the owners of the corporations that produce those models @FediThing, I think they won't notice nor much care about that threshold.

Their primary internal directive is evidently โ€œjam as much data into the training funnel as possible, doesn't matter where it comes fromโ€. Given this directive, their underlings are already deploying aggressive botnets to scrape the entire web, repeatedly, without heed for any resource limits.

https://lwn.net/Articles/1008897/

So I think they'll just keep pressing that accelerator, and not really notice nor much care when the web is dead.

Fighting the AI scraperbot scourge

There are many challenges involved with running a web site like LWN. Some of them, such as fin [...]

LWN.net

@bignose

What do they do when they run out of stuff to scrape because they've destroyed it? Or if the quality of what is left is negligible?

The death of the web won't mean the *absence* of the web; it just means it'll predominantly be auto-generated slop. So, in that scenario, they won't run out of stuff to scrape.

As for the quality of what's on that web? They show no sign at all of caring about the quality of what's on *today's* web, they just scrape all of it continually. I don't think they'll notice nor care when it's mostly slop; they'll just endlessly scrape the dead web and feed their machine on it.

Will that make a difference? They don't seem to care that today's #LLM output is mostly crap, so it's hard to say why they'd care if it declined.

@FediThing

@bignose

"The death of the web won't mean the *absence* of the web; it just means it'll predominantly be auto-generated slop. So, in that scenario, they won't run out of stuff to scrape."

Then you get model collapse though, if there's no human input any more?

@FediThing, yes. And the people who can see that coming, have (by intentional arrangement) no connection to the people with their foot on the accelerator. So the awareness that they're heading for model collapse will not be sufficient to stop them.

So, given that the #LLM corporate owners have nothing riding on the quality of output, I think you're right @FediThing that this:

> what will society do if it builds its structures around stealing knowledge from people until the people give up producing knowledge?

is the important question. The existing incentives will not stop the #GenAI death of the web, so it's up to us to alter those incentives (muscular regulation, for one) to prevent them from killing the web.