Mastodawn

grebedoc had its highest share yet of serving garbage requests yesterday (a wave peaking at 150 req/sec)

these waves are getting bigger and bigger which is somewhat concerning. it's nowhere near the hardware capacity yet but i'm hitting some software bottlenecks that i've never thought would be relevant

Show thread

✧✦Catherine✦✧1d ago

git-pages has a sophisticated multilayer cache system which fails to perform well in exactly one case: if someone sends a lot of requests to domains that don't even have valid sites deployed

because i figured that nobody would do this. certainly that nobody would do it regularly and at incredibly high speed

well. fucking scrapers

Show thread

truh 1d ago

@whitequark how does one even mess up scraping that badly?

Show thread

🆘Bill Cole 🇺🇦

@truh @whitequark Believing that LLMs are in fact AI.

I help manage a site where 'deep' URLs follow obvious patterns. The elements are obvious & one can build millions of possible URLs for the site using public info, mostly of which don’t exist.

The so-called "AI Scrapers" have been asking for thousands of such invented URLs at the site all at once, with most of those which could be correct taking a few seconds to construct from mostly-archived data. The scrapers don’t even wait.