@tarmil @gabrielesvelto perhaps. Does the existence of AI incentivize more entities to deploy spidering than in the past? I suppose the premise is that grabbing content is a more compelling prospect if one can employ processing with greater ROI than simply indexing and/or mirroring.
It makes some sense but is hardly assured.
@Qbitzerre @tarmil @gabrielesvelto https://www.theregister.com/2025/08/29/ai_web_crawlers_are_destroying/
Not just perhaps. Not just many more entities. Far, far more aggressive crawling. Crawling that doesn't take everything and leave, it takes everything and then starts again. Crawling that doesnt respect robots.txt in the slightest.
Seems like a lot of site admins can show spectacular increases in frequency and volume of scrapers as well as in disregard for previously established norms (e.g. robots.txt) since scrapers are looking for LLM training material.
@grumpyoldtechie @gabrielesvelto I had the impression that a robots.txt file is hit or miss with regard to compliance regardless of the purpose for scraping. Beyond that, the rapaciousness of scraping is at least somewhat subjective, existing on a spectrum that is in part dependent on the application and purpose.
The qualitative difference of AI is that derivative products exemplify a novel formulation with regard to what might have been deemed fair use in the past.
Good point!

@gabrielesvelto
Another stupid power-related argument:
"Humans also consumer power" or "Humans consume more power per task" or anything of that stripe.
Humans consume energy at a near-constant rate.
They keep consuming energy when you take away their tasks to feed to the autocomplete machine.
The only way to "save" that energy is to kill those humans, but killing humans is extremely energy intensive (not to mention fucking evil).
So how exactly do these TESCREAL ratfucks expect this "but humans also..." argument to work?
The humans still exist, hence the energy spent on the autocomplete machine == wasted.
@gabrielesvelto YES. This is having a daily impact on the useability of the ENTIRE world wide web, as we repeatedly have to prove we are human, or see sites go dark due to the constant DDOS attacks that webscraping LLM bots bring.