grebedoc had its highest share yet of serving garbage requests yesterday (a wave peaking at 150 req/sec)
these waves are getting bigger and bigger which is somewhat concerning. it's nowhere near the hardware capacity yet but i'm hitting some software bottlenecks that i've never thought would be relevant

git-pages has a sophisticated multilayer cache system which fails to perform well in exactly one case: if someone sends a lot of requests to domains that don't even have valid sites deployed

because i figured that nobody would do this. certainly that nobody would do it regularly and at incredibly high speed

well. fucking scrapers

i'm going to have to add a Bloom filter and another cache invalidation mechanism which i'm not enthusiastic about but it seems prudent to do it before it results in an outage (grebedoc has never had a scraper-induced outage so far, and neither had the codeberg git-pages instance)
@whitequark Is the additional cache invalidation to handle removals from the bloom filter? Are you just planning to rebuild the bloom filter periodically or ...?
@e_nomem rebuild whenever a domain is added or removed (or on a superset of those operations, ideally a small superset to avoid waste of resources) but not more often than e.g. 60s
@whitequark do you need to rebuild when a domain is removed? Given that there'll be false positives anyway... (and inserting a domain should be relatively cheap with a Bloom filter, until the false positive rate gets higher than you want it to)
@Taneb yeah now that you mention it, not really