Attention server admins! Yesterday I've read a post by @simon_brooke how nasty AI scraper bots are attacking his self-hosted @forgejo instance. Soon after I'm seeing unusual, periodic traffic spikes on mine and again - dominated by OpenAI, but some other freeloaders too:

20.171.207.41 GPTBot/1.2 85.208.96.211 SemrushBot/7~bl 54.36.148.64 AhrefsBot/7.0 114.119.139.53 PetalBot

With GPTBot and SemrushBot attacking hardest 

They've been hammering my little server periodically today as well, slowing down my instance dramatically as if I was experiencing malicious DDoS attack  Well, in a sense it is one 

Watch out - it seems corporate AI techbros learned to scrape  content and starts doing it on a massive scale  Remember when @Codeberg was (and repeatedly is) hit?

For now blocked IP ranges and User-Agent combinations, not sure for how long that will be enough 

Please boost for visibility and be prepared!

#forgejo #developerlife #coding #attack #techbros #aislop #openai #bots #ddos

@gytisrepecka @simon_brooke @forgejo @Codeberg Wot I don't get is why these bots burn carbon and waste money REPETITIVELY SCANNING THE SAME UNCHANGING CONTENT.

@TimWardCam Because they have enough resources to burn - I'd guess it's good to report how many GBs scrapers ate for investors. You know, KPIs, milestones and all that crap 

@simon_brooke @forgejo @Codeberg

@gytisrepecka @simon_brooke @forgejo @Codeberg The word "unique" should be in the KPI somewhere. These companies are bright enough to have several members of staff who realise this.

@TimWardCam There is hardly any sanity when companies are sourcing for investment - anything goes, bright minds may have no word in it at all 

@simon_brooke @forgejo @Codeberg

@gytisrepecka @simon_brooke @forgejo @Codeberg That sort of reminds me of that period in the 1980s when word processors were sold on the number of words in the spelling dictionary - the more the better, according to the marketdroids.

But do you really want the word "formicate" in your spelling dictionary (I see it's in this one). Isn't it far more likely to be an error for "fornicate"?

More isn't always better.

@gytisrepecka @TimWardCam @forgejo @Codeberg I'm guessing it's cheaper for them to repeatedly hit my poor little server than to just keep their own local clones of my repositories.