Wikipedia has banned AI-generated text, with two exceptions

https://infosec.pub/post/43865778

Wikipedia has banned AI-generated text, with two exceptions - Infosec.Pub

Lemmy

Saved you a click:

After much debate, the new policy is in effect: Wikipedia authors are not allowed to use LLMs for generating or rewriting article content. There are two primary exceptions, though.

First, editors can use LLMs to suggest refinements to their own writing, as long as the edits are checked for accuracy. In other words, it’s being treated like any other grammar checker or writing assistance tool. The policy says, “ LLMs can go beyond what you ask of them and change the meaning of the text such that it is not supported by the sources cited.”

The second exemption for LLMs is with translation assistance. Editors can use AI tools for the first pass at translating text, but they still need to be fluent enough in both languages to catch errors. As with regular writing refinements, anyone using LLMs also has to check that incorrect information hasn’t been injected.

Wikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.

I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.

AI already trains on Wikipedia.

commoncrawl.org

Common Crawl - Open Repository of Web Crawl Data

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

Sell licenses of what? It’s already all in the creative commons iirc.
The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.

Yes, but…

en.wikipedia.org/…/Wikipedia%3ADatabase_download

That’s because viewing the page uses server resources, as done API access. If you want the data you can download the database directly.

Wikipedia:Database download - Wikipedia

This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that: lemmy.world/comment/22826863
Wikipedia has banned AI-generated text, with two exceptions - Lemmy.World

Lemmy