I have been trying for hours to debug what I thought was a #Treafik regression causing massive CPU usage to the point that my own (very small) dockerized services I host for my family are getting slow and hard to reach.

But no, my meager virtual server is just being DDoSed by stupid #AI bots downloading pieces of my webpage over and over and over again. It looks like the steps described in https://www.mayrhofer.eu.org/post/defenses-against-abusive-ai-scrapers/ are no longer working, and I need to start looking into actual IP blocking. However, as AI fraudsters are resorting to using massive client pools for downloading, that will also become difficult.

This is new. The AI scraper bots amounting to over 90% of all traffic until a couple of weeks ago were annoying, but my services still worked. Now it has reached the level of active denial of service.

I declare partial success with much more aggressive connection, request, and transfer rate throttling in the embedded #nginx instance that serves my static page (plus the dynamic link maze that caught the stupid "AI" scraper bots...) behind #traefik (causing the TLS termination part to be overloaded and blocking authenticated users from legitimate access).

https://www.mayrhofer.eu.org/post/defenses-against-abusive-ai-scrapers/nginx-default.conf is the current rate limiting config, https://www.mayrhofer.eu.org/post/defenses-against-abusive-ai-scrapers has the explanations.

Something like https://blog.lrvt.de/configuring-crowdsec-with-traefik/ will probably have to be the next level of escalation to deal with the issue on a global level.

@rene_mobile I see a class action lawsuit... :-(
@rene_mobile someone has created a proxy to fight the AI bots : https://anubis.techaro.lol/
Anubis: Web AI Firewall Utility | Anubis

Weigh the soul of incoming HTTP requests to protect your website!

@rene_mobile In the end, we may need ip-range traffic-shaping stuff to just slow all users down to _user_ level usage. 🙁
@rene_mobile
To make the AI bot life miserable, install
anubis proof of work https://anubis.techaro.lol/
Or
checkpoint cryptographic challenge https://github.com/vaxerski/checkpoint
Or
nepenthes infinite maze https://zadzmo.org/code/nepenthes/
Or
quixotic nonsense generator
https://marcusb.org/hacks/quixotic.html
Anubis: Web AI Firewall Utility | Anubis

Weigh the soul of incoming HTTP requests to protect your website!

@gunstick I am using Quixotic right now instead of Nepenthes (because it was easier to get to run in my setup and seems quite a bit more efficient). Not sure if Anubis or Checkpoint work without Javascript support - I am trying to keep my (static) webpage completely usable without client-side code execution so far. If a site doesn't have the constraint, they will probably work very well...

@rene_mobile i'm curious: do they at least have a proper user agent, or just chrome as it's probably a puppeteer/selenium thing?

My suggestion: You embed a script that gets blocked by easylist - and anyone who hits it gets blocklisted for 10 minutes + forwarded to google via js.

I'd asume even jsblocking crawlers would still try to open the js - unlick jsblockers or adblockers.

@rnbwdsh User agent strings seem to be intentionally misleading. Some examples I see:
"Mozilla/5.0 (X11; U; Linux armv7l; en-GB; rv:1.9.2.3pre) Gecko/20100723 Firefox/3.6.11", "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4", "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.2", Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.134 Safari/534.16", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b6pre) Gecko/20100903 Firefox/4.0b6pre", etc.

That's part of why I am calling them out as illegal, malicious DDoS.