Mastodawn

Instead of blocking AI crawlers, sites with good, legitimate content should serve up garbage to the crawlers and poison the training models

Show thread

John Minnihan Jul 18, 2023

@anildash BEEN ON IT FOR 19 YEARS, MAN

Show thread

Dreaming of dad jazz.Jul 19, 2023

@jbminn @anildash ai tarpits.

Show thread

m0xEE Jul 19, 2023

@jbminn
And I'm not even doing anything to serve it to the crawlers in particular, I'm just being my usual self 😂
@anildash

Show thread

Brian Perry Jul 18, 2023

@anildash Stop making sense

Show thread

🇺🇦 Sean Prophet ⚛Jul 18, 2023

@anildash

Why? This is just pointless sabotage.

And...any decent AI training model already has to filter out garbage so would be unaffected.

What are you trying to accomplish?

AI models will provide huge benefits. (They already are). Why not focus on the real problems, which are the same old enemies of capitalism and corrupt governance?

Show thread

James M.Jul 18, 2023

@sean @anildash so-called "AI" could possibly provide huge benefits if the gains were for the public, but as with almost all capitalist endeavors, the gains are concentrated in the hands of the very wealthy. And remember, the ultimate goal of the very wealthy is simply to do away with human workers, because robots are less troublesome than humans demanding rights, etc. I think the reaction against AI is at least partially from a growing realization of this fact.

Also, AI is already causing real harms, not just theoretical. I mean things like racially biased facial recognition systems, people's reputations being destroyed because of false ChatGPT results, etc. People shouldn't rely on results from AI, but they do, and the AI systems are certainly marketed as being reliable.

Show thread

🇺🇦 Sean Prophet ⚛Jul 18, 2023

@jamesmarshall @anildash

So you admit the problem is capitalism and governance, not AI itself.

We should WANT to do away with human labor. Fully automated luxury communism isn't a pipe dream anymore. The threat is that the wealthy will simply buy machines to make everything, fire human workers and leave them to starve. But then who will they be making products and services FOR?

Eventually it's going to have to come around to paying people for not working, because otherwise no one will be able to buy the work product of the machines. And there will be no purpose for running the machines. And nothing to keep making money for the wealthy.

This is a political, not technological problem and same as it ever was, it still all goes back to politics and the need to control wealth.

Show thread

James M.Jul 18, 2023

@sean @anildash the problem is both capitalism and the development of any technology that furthers wealth/power disparity. I already listed some specific problems with AI today. Blindly plowing ahead with all tech, without regard to the current power structures, can just concentrate power even further.

Sure, I'd love to provide all humans with a life of leisure, if that were the direction we're going. But it's not. Once all work is automated, the controlling class won't need anyone to sell anything to, or have any other need for anyone else. That's how they seem to see it, anyway. I think such a view is mistaken, for one thing since the tech they rely on will inevitably have problems that need to be fixed by capable workers, but I really get the sense they're shooting for automating everything so they don't have to deal with human workers. Sure, they could share the leisurely life with everyone (by letting them survive), but they have no incentive to, and have no past record of doing so.

Show thread

🇺🇦 Sean Prophet ⚛Jul 18, 2023

@jamesmarshall @anildash

It's the desert island problem: You can't ever build wealth without workers to hire. And once you build wealth it does you no good without workers to hire.

I agree wealth is a HUGE political problem. But trying to suppress technology won't solve it. Someone is going to create this tech, no matter how many people oppose it.

So we're back where we started, solving the political problem of oligarchy.

Show thread

mmby Jul 19, 2023

@sean @jamesmarshall @anildash I don't think it will necessarily lead to UBI - I think the most probable end of climate change + automation is a global Brasil: a comfortable upper class that still has access to all amenities despite the climate and a destitute, surveilled and controlled lower class, with tiny shreds of what was considered middle class

if there is any UBI it's only there to quell bread riots, might just be bread instead of money

Show thread

maya Jul 19, 2023

@sean @anildash I really don’t think protecting your work from misuse constitutes “sabotage.” We don’t have a collective responsibility to ensure someone’s plagiarism-machine is receiving top-quality data. If they want good data, they should *ask* to use people’s material. Being online is not de-facto consent to any and all appropriation, and “you didn’t say I couldn’t” to morally-questionable unforeseen usage is not legitimate imo.

Also “blacksun”? What’s up with that..?

Show thread

🇺🇦 Sean Prophet ⚛Jul 19, 2023

@mayadev @anildash

If AI training is "plagiarism," then so is any research, or use of online data as reference material.

I think it's tough for people to wrap their heads around the fact that language models and generative art are actually writing and painting--creating original work.

I've made this argument repeatedly, and people seem to continue to mischaracterize what AI is doing as mere copying.

Authors and artists continually use prior work as reference. AI is doing the same thing humans have done since the beginning of civilization.

The whole thing is a misdirection. The left has failed to successfully take on capitalistic oligarchy. So now people are focused on the proximate cause of technology as a scapegoat, while ignoring the ultimate cause of our troubles-- the worldwide breakdown of Democratic governance and the rise of a new feudal class of billionaires.

Show thread

John Francis Jul 18, 2023

@anildash once a these find data, there's no way to keep them using it in the agreed way.

Data poisoners should be everywhere.

My favourite is AdNauseam. It clicks ALL the ads it sees, poisoning the meaning of ad data collection, and costing advertisers $.

https://adnauseam.io/

AdNauseam - Clicking Ads So You Don't Have To

A browser extension that clicks on every blocked ad to fight advertising surveillance.

Show thread

Jannis Schakarian ✅Jul 18, 2023

@anildash oh thats what Elon is doing. Just for everybody.

Show thread

Tane Piper ⁂Jul 18, 2023

@anildash Yea, but now you have bad actors poisoning the well with even more propaganda and fake news, and companies determined to use these AI services so humanity will continue to slide ever closer to oblivion.

Blocking is much better for the environment and humanity

Show thread

Joe Mansfield Jul 18, 2023

@anildash This is cool but at a single site level it will probably only accelerate the emergence of AIs smart enough to recognise your chosen level of garbage.

However, coordinated community level synchronised garbage could be used to thoroughly poison the training potential of data hoovered up via random crawling.

Show thread

Leftnotracks Jul 18, 2023

@anildash That appears to be Twitter’s strategy.

Show thread

ኢራ עירא Ira عيرا 🍓🎗️Jul 18, 2023

@anildash I'm not sure what's the harm exactly, but how does creating garbage helping here? If it's a bot that disobeys robots.txt then just give it 404 or maybe a 403. As for myself I don't mind my blog being scraped. Makes my text effect actual change in the world for a change...

Show thread

Stressed Out Maikel 🇪🇺Jul 18, 2023

@anildash agreed. And I know actually how to do it. There's a node express middlewafe that detect crawlers, and every other programming language has a similar package/'module to do the same.

We should feed them random words with the suffix "porn" and destroy Googles empire.

Mwahahahah!

Show thread

Stressed Out Maikel 🇪🇺Jul 18, 2023

@anildash hmmm and I've just built a mental model of what's is required to do this in a NextJS-based Website. Since there is the head component AND Server Side Rendering functions you could poison them without having to even detect if they are crawlers.

Just add random word + porn to all the fields of the meta tags with Head.

Show thread

ShadSterling Jul 18, 2023

@anildash if it were possible to enforce licensing terms, I’d be inclined to put a clause in the license saying that including anything from the site in an ML training set incurs a large fee for each result generated by anything based on that set

Show thread

Doug Bowman Jul 18, 2023

@anildash Reminds me of the way some of us used to set up mod_rewrite to swap out any image with one that read “This image is stolen” when they were linked to from other domains.

Show thread

your auntifa liza 🇵🇷 🦛 🦦Jul 18, 2023

@anildash am loving the new and improved Anarchist Dash. 10/10 would buy to his cookbook

Show thread

Trevor Flowers Jul 18, 2023

@anildash I don't know that it would have a good effect but it sounds like fun. A federation of sites serving crawlers with an internally consistent wave of BS would be a neat way to spread a fictional narrative.

Show thread

Trevor Flowers Jul 18, 2023

@anildash I wonder whether we could find an author who'd allow use of their fictional works to feed an LLM to generate the text for a federation of shite.

Show thread

Richard Rathe Jul 18, 2023

@anildash

No need… have you seen #Twitter recently?! 😜

Show thread

Paul Cantrell Jul 18, 2023

@anildash @misc
I am increasingly interested in this as a general strategy:

Browser plugins that “click” and invisibly open-but-don’t-really open random Google ads.

Burner accounts on Facebook filled with garbage that randomly follow, like, and message each other.

Spam the machine learning.

Show thread

Jon Jul 18, 2023

@inthehands @anildash @misc there is already the worry that AI generated content creates a negative feedback loop as that content in turn naturally makes its way into training data, so this is sort of like slightly accelerating that entropy :)

Show thread

Tim Lavoie Jul 19, 2023

@inthehands @anildash @misc For the first part, does AdNauseum fit the bill?

https://addons.mozilla.org/en-US/firefox/addon/adnauseam/

AdNauseam – Get this Extension for 🦊 Firefox (en-US)

Download AdNauseam for Firefox. Blocking ads and fighting back against advertising surveillance.

Show thread

Paul Cantrell Jul 19, 2023

@tim_lavoie @anildash @misc
That’s exactly the sort of thing I’m thinking of, yes!

Show thread

Brian Grinter Jul 19, 2023

@anildash @lisamelton evil…

Show thread

Janne Moren Jul 19, 2023

@anildash
I just got my data dump from Reddit. My next step will be to replace all my comments with random bits of ChatGPT output.

That is not just garbage; it's garbage that we know disrupts training in a real way.

Show thread

Vi Jul 19, 2023

@anildash This is what Kodeco does!

Show thread

Mark Jul 19, 2023

@anildash First you have to catch them at it.

Show thread

James Purser Jul 19, 2023

@anildash how do you deal with the bad actors? The ones who don't use unique UserAgent strings?

Show thread

Martino Mosna Jul 19, 2023

@anildash Yes! This!

Show thread

Bill Plein🌶Jul 19, 2023

@anildash I see a possible Wordpress plug-in as a start.

1) relatively hidden links (like those used in honeypots)
2) links point to garbage pages. Maybe the garbage was even written by a LLM designed to write obviously garbage text.
3) the plug-in creates numerous backdated posts and pages, not easy for a human to see or navigate to, that are easily scraped by bots
4) no SEO on the garbage pages, to reduce the chances of search engines finding them.

Show thread

Jason Punyon Jul 19, 2023

@anildash i think this is a good “enhanced defederation” strategy too.