grebedoc had its highest share yet of serving garbage requests yesterday (a wave peaking at 150 req/sec)
these waves are getting bigger and bigger which is somewhat concerning. it's nowhere near the hardware capacity yet but i'm hitting some software bottlenecks that i've never thought would be relevant

git-pages has a sophisticated multilayer cache system which fails to perform well in exactly one case: if someone sends a lot of requests to domains that don't even have valid sites deployed

because i figured that nobody would do this. certainly that nobody would do it regularly and at incredibly high speed

well. fucking scrapers

i'm going to have to add a Bloom filter and another cache invalidation mechanism which i'm not enthusiastic about but it seems prudent to do it before it results in an outage (grebedoc has never had a scraper-induced outage so far, and neither had the codeberg git-pages instance)

@whitequark nginx has amazing limit_req_module that can easily throttle IP’s that do some nasty shit, like doing a lot of 404 request. You can just tell it to spit 1bbps to connections that fall in given zone.

It’ll cost you having open connections, but otherwise cheap way to solve this without doing another layer of caching.

But given you have caddy for tls provision, it’s not immediately obvious how to front it with nginx

Module ngx_http_limit_req_module