Mastodawn

grebedoc had its highest share yet of serving garbage requests yesterday (a wave peaking at 150 req/sec)

these waves are getting bigger and bigger which is somewhat concerning. it's nowhere near the hardware capacity yet but i'm hitting some software bottlenecks that i've never thought would be relevant

Show thread

✧✦Catherine✦✧1d ago

git-pages has a sophisticated multilayer cache system which fails to perform well in exactly one case: if someone sends a lot of requests to domains that don't even have valid sites deployed

because i figured that nobody would do this. certainly that nobody would do it regularly and at incredibly high speed

well. fucking scrapers

Show thread

✧✦Catherine✦✧1d ago

i'm going to have to add a Bloom filter and another cache invalidation mechanism which i'm not enthusiastic about but it seems prudent to do it before it results in an outage (grebedoc has never had a scraper-induced outage so far, and neither had the codeberg git-pages instance)

Show thread

Eashwar 1d ago

@whitequark Is the additional cache invalidation to handle removals from the bloom filter? Are you just planning to rebuild the bloom filter periodically or ...?

Show thread

✧✦Catherine✦✧1d ago

@e_nomem rebuild whenever a domain is added or removed (or on a superset of those operations, ideally a small superset to avoid waste of resources) but not more often than e.g. 60s

Show thread

Taneb

@whitequark do you need to rebuild when a domain is removed? Given that there'll be false positives anyway... (and inserting a domain should be relatively cheap with a Bloom filter, until the false positive rate gets higher than you want it to)

Show thread

✧✦Catherine✦✧1d ago

@Taneb yeah now that you mention it, not really