Well, this is a step in the right direction:
https://www.theverge.com/news/841222/rsl-licensing-ai-spec-launch
A licensing standard aimed at making AI companies pay for the content they scrape across the web is now official. With the publication of the RSL 1.0 spec, publishers can dictate licensing rules and indicate whether they want their content to appear in AI search.
Dans cet article du Diff, nous expliquons une mise à jour récente que nous avons apportée aux données de trafic des utilisateurs de Wikipédia, les tendances que ces données révèlent, la manière dont la Fondation réagit et comment vous pouvez nous aider.
https://diff.wikimedia.org/fr/2025/11/07/nouvelles-tendances-chez-les-utilisateurs-de-wikipedia/ #Scraping, #ScrapingBots
Wikimedia Infrastructure is being mass-scraped for AI Usage — the content is free, the infrastructure is not. https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects
(original repost on lobsters: https://lobste.rs/s/autpsf/how_crawlers_impact_operations)
If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?
Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.
Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.
207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +https://lemmy.cryonex.net
23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +https://lemux.minnix.dev
2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +https://lemmy.sidh.bzh
50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +https://toast.ooo
69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +https://lemmy.schlunker.com
155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +https://lemmy.mbl.social
#MastoAdmin #AIBots #Scrapers #Scraping #ScrapingBots #privacy
😤 #Scraperbots are automating data theft, extracting your website's content without permission! 🌐
💣 Learn about the impact of scraper bots and how to prevent them: https://bit.ly/3RiXgya
#contentscraping #bots #webscrapers #webcrawlers #scraping #waf #botmanagement #waap #scrapingbots #apptrana #indusface