Working on some poison-as-a-service (PaaS). Looking to launch in the next few days.
Working on some poison-as-a-service (PaaS). Looking to launch in the next few days.
Also working on a zip bomb, to randomly scatter in among the links.
Thanks to @anaiscrosby I came across this excellent method, using LZ77:
https://natechoe.dev/blog/2025-08-04.html
TBH I was just going to `dd if=/dev/urandom` my way to a titanic RAM flooding *.gz, but am getting great results with the above, and with bonus site data honey inside to keep bots on the chase.
@anaiscrosby After seeing ChatGPTBot blow 123 seconds on my drip-feed poison tarpit and then never come back, I got reading on how modern LLM scrapers might employ mechanisms to detect tarpits and blacklist.
During research I came across this tarpit evading scraper that provides some interesting insights into how modern LLM scrapers might do this.
https://github.com/Draconiator/Ipema
This gives me pause and has me looking at other solutions for counter-detection.
The GeoCities CSS is going nowhere.
@anaiscrosby Running a non-Markov tarpit for half an hour on one public link, and already have Claude lost in my swamp. Waiting to see if it runs into my ZIP bomb
---
216.73.216.124 - - [07/Apr/2026:03:28:49 +0200] "GET /tarpit/until/same/drive/harmattan_leftmost_intranscalency_few_ministries_few_between HTTP/2.0" 200 10132 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])" "-"
---
@anaiscrosby It hit it, but I guess decompressed in a thread. It's a 127M archive that decompresses to 128GB. The bot kept scraping for a bit and then dropped off. Difficult to know if it was a discouragement.
Strange is that soon after other IPs were reaching statistically non-guessable randomly generated URL paths, without touching the webroot or another other tarpit URL prior. They all had iOS UA strings (readily forged).
@baillehache_pascal @anaiscrosby
In fact I started with Markov but learned later that since the infamy of Nepenthes (which I tested), some researchers have urged teaching crawlers to detect Markov-generated text, and also look out for text whose words are all in the dictionary (no typos, esp). This is seemingly not difficult to do. And so I moved to a different solution (Pyison), that I then modified to produce a mix of dictionary sourced and random words, with images and in a blog-like format