It's actually daylight now and I'll need to get ready for work, so maybe I'll shut up soon. Anyway, pondering a bit further...
Spigot generates page content and links using Python's random number generator. To make it deterministic (i.e. the same URL will always give the same page), it seeds the random number generator just before creating the content, with the seed being a 64 bit hash of the page URL.
Effectively, this means that Spigot's entire "page space" is around 1.8E19 pages. In terms of trapping crawlers, that's near enough infinite - at a million requests per second, it would take over half a million years to exhaust all possible pages.
My problem, right now, is that the crawlers have made around 1.2 billion requests to Spigot, which means their (aggregate) index probably holds around 30 billion Spigot URLs, most of which are going to be in a backlog for later scanning (hurrah). I can't get away from that, and I guess I'll need to live with it until the AI bubble bursts.
I don't really want to get rid of Spigot completely - if for no other reason than I've enjoyed tinkering with it.
And it's struck me that I could have a tunable "site size" value. When the random number generator gets seeded, rather than using the 64 bit number it uses that number modulo the "site size" value. So, if the site size was only 100, we'd only seed the RNG with one of 100 values, which would mean that only one of 100 possible pages would ever be created. I'd need to restructure things a bit, so that internal links remained internal (i.e. when generating internal links, it would need to generate them by choosing a random number, taking that number modulo the site size value and running the page generator seeded with the result to generate page title and URL). That's fiddly, but not a major problem.
It's a reasonable assumption that crawlers don't go round and round and round hitting the exact same URL thousands of times per day, so a spigot with (say) a million possible pages could be useful in poisoning models without exposing me to ongoing load.
Food for thought. Talking of food: time for breakfast!