Giant Corporations™ are scraping my little git server to feed their ever-hungry, planet-destroying plagiarism machines.

So now, instead of getting my code, they get a 10GB treat.

Fucking THIEVES.

edit: This was inspired-by-and-based-on this post https://rknight.me/blog/blocking-bots-with-nginx/

Blocking Bots with Nginx

How I've automated updating the bot list to block access to my site

@j Amazon bot is very persistent isn't it, weeks and weeks of telling it to eff off and it's still scrapping like it's being held at gunpoint
@dee @j I had to threaten Amazon on the GDPR and breach of copyright front to get them to stop. They even made that more of a hassle than their idiot bot

@kc @j do you know if they follow the 307 ?

at the moment my nginx conf is:

if ($http_user_agent ~* (Amazon|facebook|GoogleBot|AhrefsBot|Baiduspider|SemrushBot|SeekportBot|BLEXBot|Buck|magpie-crawler|ZoominfoBot|HeadlessChrome|istellabot|Sogou|coccocbot|Pinterestbot|moatbot|Mediatoolkitbot|SeznamBot|trendictionbot|MJ12bot|DotBot|PetalBot|YandexBot|bingbot|ClaudeBot|imagesift|FriendlyCrawler|barkrowler)) {

return 403;
}
if ($http_user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36") {
return 403;
}
if ($http_user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:94.0) Gecko/20100101 Firefox/94.0 BB SC/1.0.0.0") {
return 403;
}

but hey, if I can return garbage successfully and at low / no cost to myself, then I would like to

also... for those who need to know this, Cloudflare's speed test allows you to define the number of bytes on the query string https://speed.cloudflare.com/__down?bytes=100000000000

@dee @kc @j I prefer to return 444 instead of 403 on nginx. It then simply drops the connection.
If they persist my fail2ban setup uses Cloudflare’s API to add the IP address to the Cloudflare firewall and it gets blocked for all my sites at Cloudflare.
@grumpyoldtechie @kc @j I'm trying not to use Cloudflare, I used to work there (I wrote the Firewall api amongst other things)
@dee @grumpyoldtechie @kc @j then I guess you can achieve the same functionality with some bash scripts