The AI bots that desperately need OSS for code training, are now slowly killing OSS by overloading every site.

The curl website is now at 77TB/month, or 8GB every five minutes.

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries

AI bots hungry for data are taking down FOSS sites by accident, but humans are fighting back.

Ars Technica

@bagder What is the use of them hammering the website over and over again. They do the same for the Fedora wiki... It is not like they need be near real-time.

Are you considering an IP block ?

@gbraad @bagder You'd have to block entire data centers and many of those are also used for public hosting. So blocking the IP ranges is often not an option if you want legitimate users to be able to access your site.

At least the big ones have proper user agents which you can black hole if they don't respect robots.txt. Honestly most of them do. But even a year ago I didn't have enough traffic from crawlers that it was even worth looking into.

@truh this is actually happening by one of the actors in the opposite direction (on a national level). Even if blocked, they will eventually acquire VM across the pond and continue. It is more they behaviour and the how.

@gbraad Should clarified that I'm only talking from my limited experience. I'm sure others experience far nastier than what I get on my little home lab that isn't even really linked anywhere.

Even if you can do something it's just so tiring that you even have to do something. This is not what I want to spend my evenings on.

@truh I have been in hosting for 'decades' (dang, that sounds bad). And yes, I have seen increased traffic, especially from a specific geo.... though some of that has moved as a spike to known cloud providers. So far, the CDNs I use have not complained and taken the brunt... like mentioned also by the OP. Ugh... just hope they would respect robots.txt.
@gbraad I don't really want Cloudflare to have my traffic.
The 8 Best Cloudflare CDN Alternatives in 2025

In this post, we have provided a list of the best Cloudflare CDN alternatives for websites. Ultimately, the choice of a Cloudflare alternative CDN should

RunCloud Blog
@gbraad Cloudflare and its competitors all decrypt your traffic... I don't really understand why people think that's ok.
@gbraad It's the infrastructure equivalent of inviting Jeffrey Goldberg into your Signal group.
@truh @gbraad not "all", depends what sort of protection you're using. I've used Akamai Prolexic before which does not.

@bracken @truh

Right, I also do not terminate (re-encrypt) at them... this is not as effective, but allows me to use my own certs, and have them merely be the entry point.