Mastodawn

Mark Krueger 1d ago

My #Wikipedia request for comment just closed, finally banning #AI content in articles! "The use of LLMs to generate or rewrite article content is prohibited"

Kudos to all who participated in writing the guideline (especially Kowal2701) and the whole WikiProject AI Cleanup team, this was very much a group effort!

https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_with_large_language_models/RfC

Wikipedia:Writing articles with large language models/RfC - Wikipedia

chaotic enby Mar 20

My genuine hope is that this can spark a broader change. Empower communities on other platforms, and see this become a grassroots movement of users deciding whether AI should be welcome in their communities, and to what extent. On their own terms.

A pushback against the #enshittification and forceful push of AI by so many companies in these last few years.

@quarknova I would say that the battle was lost when Wikipedia allowed big tech to buy access to copyleft content without needing to share alike.

Your new policy simply enforces "fresh meat" for the models, without any requirement for reciprocity back to the commons.

Wikipedians then, are signing up to work for free to feed the models, while people downstream from the models can use their labor entirely for free without giving back.

@yoasif @quarknova sadly, this is the reason I won't donate anymore to Wikipedia, instead I'll donate to Internet Archive.

RootWyrm 🇺🇦

@DrPen @yoasif @quarknova well guess you'll no longer be donating to them either since they're encouraging and signing deals to let them scrape wayback to get around other sites blocking the slop peddlers?
And you'll be taking a stand against the EFF who insists that slop peddlers scraping the entire Internet is 'fair use.'

@rootwyrm @DrPen @quarknova Scraping Wayback makes way more sense to me than scraping live sites, FWIW.

I don't know if you need to take a stand against the EFF -- you might just want to start off with a stand against that position. 🤷

@yoasif @rootwyrm @quarknova Scraping open content is going to happen. What we do about it needs to be much more robust, technically and legally.
a) AI scraping is not fair use bc of vast profit made by AI Co's, and amount of content scraped/copied.
b) fair use is a US construct, interpreted differently elsewhere. EFF know this. (https://www.eff.org/deeplinks/2016/02/murky-waters-international-copyright-law)
c) to me, EFF decision seems contrary to their own standards.
d) AI scraping goes against most CC licences.

#AI #fairuse #academia #copyright

@yoasif @quarknova

I think the truth is that it is a total mess when it comes to AI and copyright regulation, we know big tech companies have literally used pirated books to train their AI and nothing was done about it. So I don't know if saying "Wikipedia allowed it" make any sense, to me it seems like they would've scraped the data anyway like they did with the books (and the entire internet for the most part).

@futureisfoss @quarknova Yes, but somehow Disney is able to demand that Google stop pirating Disney works for its LLM: https://arstechnica.com/google/2025/12/disney-says-google-ai-infringes-copyright-on-a-massive-scale/

Wikipedia could have presented a legal challenge to the LLM providers, or simply stated that "you are indexing our servers, we can see it - if you don't stop, we will sue to protect our community".

Instead, they got paid to sell out the community.

Disney says Google AI infringes copyright “on a massive scale”

Disney demands that Google immediately block its copyrighted content from appearing in AI outputs.

Ars Technica

@yoasif @quarknova

It would be interesting to see how a license like Creative Commons be applicable in the case of LLMs, does it mean all of the LLM's output must also be licensed under the same? Yeah it would've been nice if Wikipedia had fought back legally, even if they might fail it would've lead to some interesting discussions about copyright laws and LLMs.

@futureisfoss @quarknova Unfortunately (and I am blogging about this in a few days, so follow me if you are interested), since LLM outputs are uncopyrightable, I don't think there is any legal way for LLMs to train on share-alike and in turn to produce share-alike contributions.

Copyright can only be assigned to human authors.

See the monkey selfie dispute for some prior context: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute

Open to more thoughts here!

Monkey selfie copyright dispute - Wikipedia

Yenn dc ☂️Mar 22

This assumes the businesses training those models wouldn't scrap that data anyway, though.

That… doesn't strike me as particularly realistic, tbh.
After all, the whole mess began because they were doing exactly that.

Now, at least they're paying to Wikimedia, and thanks to the policy not making it worse…

@yenndc That doesn't assume that. Wikipedia could have sued to protect the license and community, rather than making a deal to opt the community out of share alike.

You are right that the policy doesn't make it worse, if you think the role of community contributors is to provide free human labor to power the big tech slop bots. If you do. the policy is perfect.

Instead, we should demand that the slop generators generate their own knowledge.

Yenn dc ☂️6d ago

The nonprofit barely making ends met should've sued several of the richest companies on earth based on an overly complex industry regulation designed and maintained exclusively for the benefit of businesses..?

Excuse me if I'm sceptic that that'd have helped…

And the second paragraph completely misses the point.
I'd go as far as saying it links the value uniquely to what AI scrappers can get of it, which is the exact opposite of what anyone would want (and I'm pretty sure you wouldn't want it either).
Wikipedia articles remaining as AI-free as possible does have value for every person who looks at it.

I'm ok with pushing the AI and scrappers as far away as possible, no discussion there — but just ditching Wikipedia because it's not throwing them far enough for one's tastes doesn't seem a good way to go forward.

@yenndc The nonprofit is making plenty.

Your comment about industry regulation made for business is odd, and I don't see how it is different here - the deal still works just dandy for businesses - it is the contributors who are left out in the cold.

Else, Someone Mar 22

@yoasif
To poison the well you share with your foe is one way to go. However enthralled I feel seeing AI companies train LLMs on LLM outputs, I think we still need our commons s.a. Wikipedia more than they do
@quarknova

Bryan Davis 1d ago

@yoasif Can I get a citation on there being a way to buy your way out of CC-BY SA? I would agree that there is basically no enforcement mechanism for copyright that the Wikimedia community can lean on other than shame, and that capitalism is shameless.

Bryan Davis 1d ago

@yoasif Are you promoting the interesting take that Wikimedia Enterprise selling API access is somehow also relicensing the Wikimedia movement’s CC-BY SA content?

@bd808 yessir.

Bryan Davis 1d ago

@yoasif Can we agree that when $COMPANY downloads an xml dump or scrapes the website they are receiving content subjected to the CC-BY SA license if they republish significant excerpts of that content? What evidence can you present that an alternative license covers content that $COMPANY obtains from APIs after signing a contract with Wikimedia Enterprise? Or am I misunderstanding your claims?

@bd808 I think the money that is paid to Wikipedia is an implicit opt-out of share-alike, since the content produced by LLMs are always public domain.

The LLM acts as a copyright removal machine, and big tech is paying Wikipedia to look the other way.

Bryan Davis 19h ago

@yoasif Nothing in your concept that any information passed through an LLM becomes public domain is related to usage of a particular API to obtain that information. These companies all already have the content produced by the Wikimedia movement or can have it with trivial effort. This is possible because the movement is about Open Knowledge, knowledge without licensing fees and boarders and resource hoarding. We gave it all away because that was the point from the start.

@bd808 The payment makes a difference.

https://www.avclub.com/wikipedia-ai-partnerships-meta-amazon-microsoft

Wikipedia could have chosen to defend its contributors. Instead, they are taking payment to allow for piracy of contributor works. Given that we know that the big tech LLMs don't respect copyright and derivative works are produced as public domain, the license is clearly being violated, and Wikipedia is being paid for the privilege.

We give it away - but alike, not to be closed. Public domain works can be closed.

Wikipedia intends to make some money from AI scraping its website

Wikipedia intends to make some money from AI scraping its website

AV Club

Bryan Davis 18h ago

@yoasif I’m still not understanding how the payment changes anything other than the funds in the Wikimedia Foundation’s balance sheet. Show me the new license terms please. The link you provided is a blog post that looks to be sourced from a Reuters article that in turn was sourced from a press release from Wikimedia Enterprise. The facts I can see there are about a handful of companies having signed up to use APIs where higher rate limits can be purchased. Where is the enclosure?

@bd808 It forecloses the possibility that Wikipedia will defend its contributors.

The license is being wantonly violated, and Wikipedia has co-signed the theft. That is what changes when the funds are exchanged.

Yes, the license has not changed - yet, it clearly has, since Wikipedia is selling an opt-out to NOT defend contributors - even as big tech produces derivative works not protected by the license that contributions were granted under.

What enclosure are you talking about?

Bryan Davis 18h ago

@yoasif I’m asking about the enclosure you are claiming. You have I think finally gotten to your thesis: if the WMF is enriched by $COMPANY then they will change their future behavior to protect that enrichment above the rights of the Wikimedia communities. Staying from the start that this is your personal speculation would have been helpful. I disagree with your thesis, but neither of us have a provable position in the near term.

@bd808 I frankly think that they already have - to be paid for access to (newly guaranteed to be human) contributions to feed the models, rather than asking the big tech bots to stop scraping while violating the license is a de facto opt out of the license terms, even as de jure the terms exist.

I am not a lawyer, so it would be handy to consult with one to get their opinion on whether contributors who contribute with the knowledge of these deals are also dual licensing their contributions.

@bd808 That would clearly absolutely be speculation - the view that courts might see a contributor's knowledge of the deals and continued contribution as an acknowledgement that those contributors were given under terms not covered by CC BY-SA -- that is not something that I am claiming, but it seems to me a real possibility to really wrest control of contributions away from the community.

I think the damage already wrought is enough, frankly - Wikipedia has ALREADY granted big LLMs an optout.

@bd808 I don't think this is my personal speculation - it is my interpretation of what is happening.

Speculation would be if I imagined that big tech was paying for contributor content - that isn't in question, it is open and announced by all parties.

What are they paying for?

Human contributions.

What license are they licensed under?

Beats me - and that is the point. It isn't share-alike, since the outputs produced by the LLMs are not share alike.

Again, what are they paying for?

Bryan Davis 16h ago

@yoasif > Again, what are they paying for?

Rate of data transfer and depending on the API some pre-processing/aggregation done centrally rather than at the graph edge by the consumer.

@bd808 Unfortunately, that is a sophistic conclusion; Wikipedia fully knows that the works are being consumed by the LLMs to produce works that violate the © of content that is being transferred.

It reminds me a bit of AllOfMP3, who claimed to be able to sell © music in Russia, but happily sold music worldwide, which the RIAA and others considered to be a violation of copyright.

AllOfMP3 also charged on volume of data - that didn't make the © infringement any more legal.

@bd808 I will grant that this is a convenient place for Wikipedia to be: since they don't actually own the copyrights to work that contributors to Wikipedia make, they can't legally opt big tech out of the terms.

Instead, they can continue to ride on their reputation as a trusted host of the corpus, and continue to garner contributions - that they turn around and sell *access* to, directly to the big tech vendors that they know will immediately violate their [the contributors'] copyrights.

@bd808 The theft is so grand that it is hard to find an equivalent analog in history - I suppose this comes close: https://archive.ph/UIhUu

People surrendered guns to be destroyed so that they could no longer contribute to gun violence.

The guns were taken and resold as guns.

People contribute to Wikipedia to contribute to a corpus of human knowledge, shared alike. Wikipedia turns around and sells it to vendors who turn it into ©-free digital slop, free of any restrictions of reciprocity.

Bryan Davis 13h ago

@yoasif I know this is tangential to your thesis, but if it does turn out that large language models are the magic trick that invalidates all intellectual property rights, or even just copyright, then you have discovered a beneficial use of the technology. I am personally attracted to copyleft as a use of copyright to prevent enclosure of my labor by capitalists. The point of copyright in the practice of copyleft is not upholding intellectual property rights as a benefit.

RE: https://mastodon.social/@yoasif/116301328058936154

@bd808 How is your labor not enclosed with LLMs?

I have seen this sentiment floating around - but I really don't get it. Perhaps you could explain it to me?

I know that when Reddit announced that works were going to fed that to LLMs that I simply left the site - I have no interest in my works being enclosed by LLMs.

Ensuring that my contributions are guaranteed to be enclosed by LLMs motivated me to stop sharing - not feel glad that my works were now even more free.

Bryan Davis 7h ago

@yoasif As I am intending to use it, enclosure means the removal of property from the commons.

I participate in Open Knowledge movements to share in a gratis and libre pool of information. When I undertake contract for hire work without retaining a share-alike license for my own labors, my output which would have contributed to the commons is enclosed. If passing that work through an LLM eliminates copyright then I can use that magic to return my labor to the commons.

I completely agree. This is not just excellent news for en.Wikipedia, it's also a symbol of holding to a high intellectual standard for other communities, many who are tempted by #TyrannyOfConvenience arguments.

glasspshr Mar 21

@quarknova thank you

Bryan Davis 1d ago

@quarknova Now some of us hope for a petition against the use of generative AI in maintaining Wikimedia FOSS projects.

Dmitri Ravinoff 1d ago

@quarknova
Thank you! Sane policy. Hope it will be used as good example / precedent elsewhere.

Gonçalo Mar 20

@quarknova Congratulations.

Brooke Vibber Mar 20

@quarknova thank goodness!

mirek kratochvil Mar 20

@quarknova good job and many thanks!

dunklecat Mar 20

@quarknova apart from the sadness of having the need to produce these documents in the first place, I'm really happy about it. Kudos to all participating, indeed!

TheNovemberFella ✊🏳️‍🌈 🇺🇦☸️🛰️🚀Mar 20

@quarknova 👍💯👍 Thank you!

Agnieszka R. Turczyńska Mar 20

@quarknova Thank you for your service! 💜

@quarknova perfect!

Paolo Amoroso Mar 20

@quarknova Thanks for resisting.

Kirils Solovjovs Mar 20

@quarknova Now, on to the hard part - detection and enforcement.

F4GRX SÃ©bastien Mar 20

@k @quarknova there are article changelogs.

@quarknova Wow! Thank you!

Matilda Love Mar 20

@quarknova thank you :3 wikipedia editors and policy-/guideline-setters are doing incredibly important work

Chris Alemany Mar 20

@quarknova congratulations!

Sharp Cheddar Goblin Mar 20

@quarknova Nice! Thank you!

Fedor Indutny Mar 20

@quarknova congratulations! Looks like due to accidental timing I've started similar effort, but for Node.js core!

chaotic enby Mar 20

@indutny Please keep us posted! I would love for this to become a greater movement, and I'm here if you need any support!

Fedor Indutny Mar 20

@quarknova will do!

The vote is in two weeks, and I'm doing all I can to gather support for AI opposition with the petition I created.

Thank you for support!

Dobviews Mar 21

@indutny @quarknova Can you please reply with a link, would love to sign and support. 😁

Fedor Indutny Mar 21

@appagalcrochet ofc, https://github.com/indutny/no-ai-in-nodejs-core ! Thank you

GitHub - indutny/no-ai-in-nodejs-core: A petition to disallow acceptance of LLM assisted Pull Requests in Node.js core

A petition to disallow acceptance of LLM assisted Pull Requests in Node.js core - indutny/no-ai-in-nodejs-core

GitHub