"we have noticed an increase in abusive site crawling, mainly from AI products and services. These products are recklessly crawling many sites across the web, and we've already had to block several sources of abusive traffic."

"One crawler downloaded 73 TB of zipped HTML files in May 2024, with almost 10 TB in a single day."

"By blocking these crawlers, bandwidth for our downloaded files has decreased by 75%"

https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/

via @stefan

AI crawlers need to be more respectful

We talk a bit about the AI crawler abuse we are seeing at Read the Docs, and warn that this behavior is not sustainable.

Read the Docs

AI bots crawling read the docs makes me not want to publish online about an obscure technology called #stackMachines and #forth.

There is so little data about these topics, let us keep it that way.

@ethanwhite @stefan

@ethanwhite @stefan

"One crawler downloaded 73 TB of zipped HTML files in May 2024, with almost 10 TB in a single day. This cost us over $5,000 in bandwidth charges, and we had to block the crawler. We emailed this company, reporting a bug in their crawler"

Well, their ai is shit then, if they can't even write a crawler properly.....

@ethanwhite @stefan „If these companies wish to be good actors in the space, they need to start acting like it, instead of burning bridges with folks in the community.“
To be honest, I don’t think they want to be good actors. They want to make money as fast as possible. Burning bridges? Who cares, YOLO!