The AI bots that desperately need OSS for code training, are now slowly killing OSS by overloading every site.
The curl website is now at 77TB/month, or 8GB every five minutes.
The AI bots that desperately need OSS for code training, are now slowly killing OSS by overloading every site.
The curl website is now at 77TB/month, or 8GB every five minutes.
@bagder What is the use of them hammering the website over and over again. They do the same for the Fedora wiki... It is not like they need be near real-time.
Are you considering an IP block ?
@gbraad @bagder You'd have to block entire data centers and many of those are also used for public hosting. So blocking the IP ranges is often not an option if you want legitimate users to be able to access your site.
At least the big ones have proper user agents which you can black hole if they don't respect robots.txt. Honestly most of them do. But even a year ago I didn't have enough traffic from crawlers that it was even worth looking into.
@gbraad Should clarified that I'm only talking from my limited experience. I'm sure others experience far nastier than what I get on my little home lab that isn't even really linked anywhere.
Even if you can do something it's just so tiring that you even have to do something. This is not what I want to spend my evenings on.
@truh there is fastly...