grebedoc had its highest share yet of serving garbage requests yesterday (a wave peaking at 150 req/sec)
these waves are getting bigger and bigger which is somewhat concerning. it's nowhere near the hardware capacity yet but i'm hitting some software bottlenecks that i've never thought would be relevant

git-pages has a sophisticated multilayer cache system which fails to perform well in exactly one case: if someone sends a lot of requests to domains that don't even have valid sites deployed

because i figured that nobody would do this. certainly that nobody would do it regularly and at incredibly high speed

well. fucking scrapers

i'm going to have to add a Bloom filter and another cache invalidation mechanism which i'm not enthusiastic about but it seems prudent to do it before it results in an outage (grebedoc has never had a scraper-induced outage so far, and neither had the codeberg git-pages instance)
@whitequark Is the additional cache invalidation to handle removals from the bloom filter? Are you just planning to rebuild the bloom filter periodically or ...?
@e_nomem rebuild whenever a domain is added or removed (or on a superset of those operations, ideally a small superset to avoid waste of resources) but not more often than e.g. 60s
@whitequark do you need to rebuild when a domain is removed? Given that there'll be false positives anyway... (and inserting a domain should be relatively cheap with a Bloom filter, until the false positive rate gets higher than you want it to)
@Taneb yeah now that you mention it, not really
@whitequark bloom filter was also the first thing that came to my mind when reading this thread. It’s the Wild West out there, apparently.

@whitequark nginx has amazing limit_req_module that can easily throttle IP’s that do some nasty shit, like doing a lot of 404 request. You can just tell it to spit 1bbps to connections that fall in given zone.

It’ll cost you having open connections, but otherwise cheap way to solve this without doing another layer of caching.

But given you have caddy for tls provision, it’s not immediately obvious how to front it with nginx

Module ngx_http_limit_req_module