“ChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future AI Development”

https://futurism.com/chatgpt-polluted-ruined-ai-development

Also, a recurring security concern (and bandwidth abuse issue) with training data sets is that they generally don't store copies of the data and have few safeguards checking if it's been changed

So, y'know, if a pre-2020 domain expires and is replaced entirely with slop, many datasets will mark the slop as pre-2020 data

ChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future AI Development

There may be no undoing the vast amounts of pollution wreaked by ChatGPT. And that's just tough luck for any AI models that come after it.

Futurism
An added complication is that generative models are eminently exploitable through training data poisoning so all the same vectors that might trigger model collapse are also attack vectors for state actors looking to control model output, which is a major potential hazard for code-generating models
Scientists create the 'world's smallest violin'

A team of researchers at Loughborough university have created the 'world's smallest violin'.

BBC Newsround
@baldur Even the #AI companies themselves are calling their tools' output junk. Saying that the internet is now so 'polluted' with AI crap that it makes it damaging to scan the Internet today & feed that data into their models. 🤦‍♀️
@baldur this was destined to happen. I’m no expert and even I could tell this was the future of AI. It was born a lossy format, without discernment, and was only going to get lossier and less discerning as it fed on itself. It is the epitome of decadence. It is morally repugnant, culturally declining, and knowledge decaying.
@JoBlakely @baldur The Industrial Age was always bound to destroy itself in the end and leave nothing but a toxic wasteland filled with mutant weeds and pests. It doesn't matter whether it's Liberal Democratic Capitalist Industrialism, Authoritarian or Fascist Capitalist Industrialism, Cyber-Feudalist Industrialism, or Soviet or Maoist Socialist Industrialism, the whole project of turning everything into an industry and mass produce anything than can be mass produced can only end in collapse because every single industrial process is only designed to solve one particular problem while disregarding all the unintended side effects that happen when that process is implemented at a really massive scale. It's the reductionism of engineering gone wild, we have been acting as if nothing really happens if we don't measure it, and the fact that in any hierarchical system, you're always trying not to tell your boss things that might you get fired.

@JoBlakely @baldur I have no tech savvy whatsoever, and I saw that writing on the wall a long time back too. If sci-fi taught me one thing, it's that if you clone a clone of a clone of a clone, what you end up with ain't gonna be functional.

¯\_(ツ)_/¯

@baldur Maybe this will finally give someone a reason to pour loads of money into The Internet Archive / Wayback Machine. 🤞
@lo_fye @baldur yeah, that was my thought... followed by the thought that rather putting money into it, they might try to buy it & shut it down to corner the market.

@baldur
For years, I have had an ongoing tendency to hoard books. To rationalize this, I always thought it would be good to have books in case internet or websites would disappear.

But what I didn't expect is the information overload and pollution

@baldur I fail to see the problem, this is the best possible outcome

@baldur

GIGO

garbage in garbage out

@baldur

AI is like a parachute. In the hands of experts with deep analytical skills (aka "scientists") it can be a very helpful tool.

But as a labor-saving device for the masses that can also make a few people lots of money, it's gonna cause the kind of destruction that all bad advice mindlessly followed will tend to do.

Also, scientists check their work.

J.Q. Public generally doesn't, as countless anecdotes on social media will attest to.

Parachutes for the masses? Nuh-uh.

@baldur so what to do with an unused domain that is 15 years old?
@utf_7 @baldur go get a prostate check or a mammogram

@baldur

Very unbiased article.
Fun read.

I like the bit where they are not sure it's a problem, but if they agree it's a problem, it will be a problem 🥹

Dead internet theory confirmed.
It's all slop.

@baldur Finally a good use-case for ChatGPT
@baldur oh... so that's why the crawlers just keep hammering over and over and over again

@baldur we called this recurrent pollution in our work. It leads to model collapse. The math is very bad indeed.

https://berryvilleiml.com/results/BIML-LLM24.pdf

@baldur

An "Ouroboros of Code"

@baldur

> training data sets is that they generally don't store copies of the data and have few safeguards checking if it's been changed

my understanding is that is similar to how the medical summarizing ones operate too. so they "listen" to your appointment/conversation with the doctor, "summarize" what was said, then don't keep the transcripts.

if the summaries could be relied upon not to hallucinate, that's one thing, but with the original transcript gone, that can lead to some scary outcomes.

@baldur
Maybe someone could create artificial discrimination?
@baldur We are entering a Dark Age and the only wisdom that we will pass down to future generations will be in the data recorded by the Internet Archive before 2023.
@baldur Who is posting the output of chatgpt on the internet, and why?

@baldur oh great, this is now the second right-wing agenda i've seen in the past week where the underlying philosophy guarantees their goals are impossible to achieve

...so, they actually did it! they built the singularity (the kind of singularity that contains no information but mass and charge)

@baldur validated pre-2020 data is going to be like pre-nuclear steel soon ​

@baldur

This is how bad actors change History.

#History #Education