My #Wikipedia request for comment just closed, finally banning #AI content in articles! "The use of LLMs to generate or rewrite article content is prohibited"

Kudos to all who participated in writing the guideline (especially Kowal2701) and the whole WikiProject AI Cleanup team, this was very much a group effort!

https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_with_large_language_models/RfC

Wikipedia:Writing articles with large language models/RfC - Wikipedia

My genuine hope is that this can spark a broader change. Empower communities on other platforms, and see this become a grassroots movement of users deciding whether AI should be welcome in their communities, and to what extent. On their own terms.

A pushback against the #enshittification and forceful push of AI by so many companies in these last few years.

@quarknova I would say that the battle was lost when Wikipedia allowed big tech to buy access to copyleft content without needing to share alike.

Your new policy simply enforces "fresh meat" for the models, without any requirement for reciprocity back to the commons.

Wikipedians then, are signing up to work for free to feed the models, while people downstream from the models can use their labor entirely for free without giving back.

@yoasif @quarknova sadly, this is the reason I won't donate anymore to Wikipedia, instead I'll donate to Internet Archive.
@DrPen @yoasif @quarknova well guess you'll no longer be donating to them either since they're encouraging and signing deals to let them scrape wayback to get around other sites blocking the slop peddlers?
And you'll be taking a stand against the EFF who insists that slop peddlers scraping the entire Internet is 'fair use.'

@rootwyrm @DrPen @quarknova Scraping Wayback makes way more sense to me than scraping live sites, FWIW.

I don't know if you need to take a stand against the EFF -- you might just want to start off with a stand against that position. 🤷

@yoasif @rootwyrm @quarknova Scraping open content is going to happen. What we do about it needs to be much more robust, technically and legally.
a) AI scraping is not fair use bc of vast profit made by AI Co's, and amount of content scraped/copied.
b) fair use is a US construct, interpreted differently elsewhere. EFF know this. (https://www.eff.org/deeplinks/2016/02/murky-waters-international-copyright-law)
c) to me, EFF decision seems contrary to their own standards.
d) AI scraping goes against most CC licences.

#AI #fairuse #academia #copyright

@yoasif @quarknova

I think the truth is that it is a total mess when it comes to AI and copyright regulation, we know big tech companies have literally used pirated books to train their AI and nothing was done about it. So I don't know if saying "Wikipedia allowed it" make any sense, to me it seems like they would've scraped the data anyway like they did with the books (and the entire internet for the most part).

@futureisfoss @quarknova Yes, but somehow Disney is able to demand that Google stop pirating Disney works for its LLM: https://arstechnica.com/google/2025/12/disney-says-google-ai-infringes-copyright-on-a-massive-scale/

Wikipedia could have presented a legal challenge to the LLM providers, or simply stated that "you are indexing our servers, we can see it - if you don't stop, we will sue to protect our community".

Instead, they got paid to sell out the community.

Disney says Google AI infringes copyright “on a massive scale”

Disney demands that Google immediately block its copyrighted content from appearing in AI outputs.

Ars Technica

@yoasif @quarknova

It would be interesting to see how a license like Creative Commons be applicable in the case of LLMs, does it mean all of the LLM's output must also be licensed under the same? Yeah it would've been nice if Wikipedia had fought back legally, even if they might fail it would've lead to some interesting discussions about copyright laws and LLMs.

@futureisfoss @quarknova Unfortunately (and I am blogging about this in a few days, so follow me if you are interested), since LLM outputs are uncopyrightable, I don't think there is any legal way for LLMs to train on share-alike and in turn to produce share-alike contributions.

Copyright can only be assigned to human authors.

See the monkey selfie dispute for some prior context: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute

Open to more thoughts here!

Monkey selfie copyright dispute - Wikipedia

@yoasif

This assumes the businesses training those models wouldn't scrap that data anyway, though.

That… doesn't strike me as particularly realistic, tbh.
After all, the whole mess began because they were doing exactly that.

Now, at least they're paying to Wikimedia, and thanks to the policy not making it worse…

@yenndc That doesn't assume that. Wikipedia could have sued to protect the license and community, rather than making a deal to opt the community out of share alike.

You are right that the policy doesn't make it worse, if you think the role of community contributors is to provide free human labor to power the big tech slop bots. If you do. the policy is perfect.

Instead, we should demand that the slop generators generate their own knowledge.

@yoasif

The nonprofit barely making ends met should've sued several of the richest companies on earth based on an overly complex industry regulation designed and maintained exclusively for the benefit of businesses..?

Excuse me if I'm sceptic that that'd have helped…

And the second paragraph completely misses the point.
I'd go as far as saying it links the value uniquely to what AI scrappers can get of it, which is the exact opposite of what anyone would want (and I'm pretty sure you wouldn't want it either).
Wikipedia articles remaining as AI-free as possible does have value for every person who looks at it.

I'm ok with pushing the AI and scrappers as far away as possible, no discussion there — but just ditching Wikipedia because it's not throwing them far enough for one's tastes doesn't seem a good way to go forward.

@yenndc The nonprofit is making plenty.

Your comment about industry regulation made for business is odd, and I don't see how it is different here - the deal still works just dandy for businesses - it is the contributors who are left out in the cold.

@yoasif
To poison the well you share with your foe is one way to go. However enthralled I feel seeing AI companies train LLMs on LLM outputs, I think we still need our commons s.a. Wikipedia more than they do
@quarknova
@yoasif Can I get a citation on there being a way to buy your way out of CC-BY SA? I would agree that there is basically no enforcement mechanism for copyright that the Wikimedia community can lean on other than shame, and that capitalism is shameless.
@yoasif Are you promoting the interesting take that Wikimedia Enterprise selling API access is somehow also relicensing the Wikimedia movement’s CC-BY SA content?
@bd808 yessir.
@yoasif Can we agree that when $COMPANY downloads an xml dump or scrapes the website they are receiving content subjected to the CC-BY SA license if they republish significant excerpts of that content? What evidence can you present that an alternative license covers content that $COMPANY obtains from APIs after signing a contract with Wikimedia Enterprise? Or am I misunderstanding your claims?