My #Wikipedia request for comment just closed, finally banning #AI content in articles! "The use of LLMs to generate or rewrite article content is prohibited"

Kudos to all who participated in writing the guideline (especially Kowal2701) and the whole WikiProject AI Cleanup team, this was very much a group effort!

https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_with_large_language_models/RfC

Wikipedia:Writing articles with large language models/RfC - Wikipedia

My genuine hope is that this can spark a broader change. Empower communities on other platforms, and see this become a grassroots movement of users deciding whether AI should be welcome in their communities, and to what extent. On their own terms.

A pushback against the #enshittification and forceful push of AI by so many companies in these last few years.

@quarknova I would say that the battle was lost when Wikipedia allowed big tech to buy access to copyleft content without needing to share alike.

Your new policy simply enforces "fresh meat" for the models, without any requirement for reciprocity back to the commons.

Wikipedians then, are signing up to work for free to feed the models, while people downstream from the models can use their labor entirely for free without giving back.

@yoasif Can I get a citation on there being a way to buy your way out of CC-BY SA? I would agree that there is basically no enforcement mechanism for copyright that the Wikimedia community can lean on other than shame, and that capitalism is shameless.
@yoasif Are you promoting the interesting take that Wikimedia Enterprise selling API access is somehow also relicensing the Wikimedia movement’s CC-BY SA content?
@bd808 yessir.
@yoasif Can we agree that when $COMPANY downloads an xml dump or scrapes the website they are receiving content subjected to the CC-BY SA license if they republish significant excerpts of that content? What evidence can you present that an alternative license covers content that $COMPANY obtains from APIs after signing a contract with Wikimedia Enterprise? Or am I misunderstanding your claims?

@bd808 I think the money that is paid to Wikipedia is an implicit opt-out of share-alike, since the content produced by LLMs are always public domain.

The LLM acts as a copyright removal machine, and big tech is paying Wikipedia to look the other way.

@yoasif Nothing in your concept that any information passed through an LLM becomes public domain is related to usage of a particular API to obtain that information. These companies all already have the content produced by the Wikimedia movement or can have it with trivial effort. This is possible because the movement is about Open Knowledge, knowledge without licensing fees and boarders and resource hoarding. We gave it all away because that was the point from the start.

@bd808 The payment makes a difference.

https://www.avclub.com/wikipedia-ai-partnerships-meta-amazon-microsoft

Wikipedia could have chosen to defend its contributors. Instead, they are taking payment to allow for piracy of contributor works. Given that we know that the big tech LLMs don't respect copyright and derivative works are produced as public domain, the license is clearly being violated, and Wikipedia is being paid for the privilege.

We give it away - but alike, not to be closed. Public domain works can be closed.

Wikipedia intends to make some money from AI scraping its website

Wikipedia intends to make some money from AI scraping its website

AV Club
@yoasif I’m still not understanding how the payment changes anything other than the funds in the Wikimedia Foundation’s balance sheet. Show me the new license terms please. The link you provided is a blog post that looks to be sourced from a Reuters article that in turn was sourced from a press release from Wikimedia Enterprise. The facts I can see there are about a handful of companies having signed up to use APIs where higher rate limits can be purchased. Where is the enclosure?

@bd808 It forecloses the possibility that Wikipedia will defend its contributors.

The license is being wantonly violated, and Wikipedia has co-signed the theft. That is what changes when the funds are exchanged.

Yes, the license has not changed - yet, it clearly has, since Wikipedia is selling an opt-out to NOT defend contributors - even as big tech produces derivative works not protected by the license that contributions were granted under.

What enclosure are you talking about?

@yoasif I’m asking about the enclosure you are claiming. You have I think finally gotten to your thesis: if the WMF is enriched by $COMPANY then they will change their future behavior to protect that enrichment above the rights of the Wikimedia communities. Staying from the start that this is your personal speculation would have been helpful. I disagree with your thesis, but neither of us have a provable position in the near term.

@bd808 I don't think this is my personal speculation - it is my interpretation of what is happening.

Speculation would be if I imagined that big tech was paying for contributor content - that isn't in question, it is open and announced by all parties.

What are they paying for?

Human contributions.

What license are they licensed under?

Beats me - and that is the point. It isn't share-alike, since the outputs produced by the LLMs are not share alike.

Again, what are they paying for?

@yoasif > Again, what are they paying for?

Rate of data transfer and depending on the API some pre-processing/aggregation done centrally rather than at the graph edge by the consumer.

@bd808 Unfortunately, that is a sophistic conclusion; Wikipedia fully knows that the works are being consumed by the LLMs to produce works that violate the © of content that is being transferred.

It reminds me a bit of AllOfMP3, who claimed to be able to sell © music in Russia, but happily sold music worldwide, which the RIAA and others considered to be a violation of copyright.

AllOfMP3 also charged on volume of data - that didn't make the © infringement any more legal.

@bd808 I will grant that this is a convenient place for Wikipedia to be: since they don't actually own the copyrights to work that contributors to Wikipedia make, they can't legally opt big tech out of the terms.

Instead, they can continue to ride on their reputation as a trusted host of the corpus, and continue to garner contributions - that they turn around and sell *access* to, directly to the big tech vendors that they know will immediately violate their [the contributors'] copyrights.

@bd808 The theft is so grand that it is hard to find an equivalent analog in history - I suppose this comes close: https://archive.ph/UIhUu

People surrendered guns to be destroyed so that they could no longer contribute to gun violence.

The guns were taken and resold as guns.

People contribute to Wikipedia to contribute to a corpus of human knowledge, shared alike. Wikipedia turns around and sells it to vendors who turn it into ©-free digital slop, free of any restrictions of reciprocity.

@yoasif I know this is tangential to your thesis, but if it does turn out that large language models are the magic trick that invalidates all intellectual property rights, or even just copyright, then you have discovered a beneficial use of the technology. I am personally attracted to copyleft as a use of copyright to prevent enclosure of my labor by capitalists. The point of copyright in the practice of copyleft is not upholding intellectual property rights as a benefit.

RE: https://mastodon.social/@yoasif/116301328058936154

@bd808 How is your labor not enclosed with LLMs?

I have seen this sentiment floating around - but I really don't get it. Perhaps you could explain it to me?

I know that when Reddit announced that works were going to fed that to LLMs that I simply left the site - I have no interest in my works being enclosed by LLMs.

Ensuring that my contributions are guaranteed to be enclosed by LLMs motivated me to stop sharing - not feel glad that my works were now even more free.

@yoasif As I am intending to use it, enclosure means the removal of property from the commons.

I participate in Open Knowledge movements to share in a gratis and libre pool of information. When I undertake contract for hire work without retaining a share-alike license for my own labors, my output which would have contributed to the commons is enclosed. If passing that work through an LLM eliminates copyright then I can use that magic to return my labor to the commons.

@bd808 Okay. Unfortunately, I don't understand what would motivate people who are not being paid to continue to contribute to the commons with no reciprocity expected.

Clearly, in your scenario, in a sense you are double-dipping - pay yourself, and pay everyone else - no need to ensure reciprocity as you are personally profiting.

I know that many are altruistic and will continue, but there is clearly a contingent that prefers reciprocity - what will motivate THEM to continue (unpaid)?

@yoasif If I have been following, the lost right as you envision it is virality of the share-alike license. CC-BY SA goes in; CC0/PD comes out. Reusers taking the CC0 output are not required to allow reuse themselves or tell anyone whose ideas they built on. Workers contributing to the commons solely for recognition or reciprocity would be alienated. I don’t know what percentage that covers, but I’m relatively certain that today the enclosed work is larger. Overall a net win for CC0.
@yoasif This is why I am skeptical that tokenization will be held as transformative in the long term. The capital class will not stand for the wholesale elimination of intellectual property rights.

@bd808 So you don't think that this will happen, but you expect contributors to contribute to Wikipedia?

Why?

EDIT: Ah, right, you think that reciprocity is ancillary.

@yoasif I am postulating that your thesis of LLM tokenization removing all copyright by being ruled non-human transformative work product will not stand. I’m not sure how that outcome has any direct impact on contributions to Wikimedia projects positively or negatively.
@bd808 You assume that LLMs will be assigned copyright?
@yoasif I actually assume they will be ruled non-transformative and therefore subject to the license terms of the input corpus. I follow the argument that all non-human outputs are non-copyrightable. I have monkey selfie pinned in my feed. But I don’t follow the argument that re-encoding with lossy compression (tokenization and neural net training) is transformative fair use. I know there is some new case law that says this; I don’t think it will last long.
@bd808 Ah, that is an interesting angle, thank you for sharing your thoughts. 👍

@bd808 If paid works are contributed to the commons, sure. That isn't what is happening in this case.

The question also arises - why would people pay for new works when you can generate one via an LLM? New works would become vanishingly rare if unprotected.

Think new drugs, for example.