Working on some poison-as-a-service (PaaS). Looking to launch in the next few days.

#AI #enjoythinking

Also working on a zip bomb, to randomly scatter in among the links.

Thanks to @anaiscrosby I came across this excellent method, using LZ77:

https://natechoe.dev/blog/2025-08-04.html

TBH I was just going to `dd if=/dev/urandom` my way to a titanic RAM flooding *.gz, but am getting great results with the above, and with bonus site data honey inside to keep bots on the chase.

natechoe.dev - A googol byte zip bomb that's also valid HTML

@anaiscrosby After seeing ChatGPTBot blow 123 seconds on my drip-feed poison tarpit and then never come back, I got reading on how modern LLM scrapers might employ mechanisms to detect tarpits and blacklist.

During research I came across this tarpit evading scraper that provides some interesting insights into how modern LLM scrapers might do this.

https://github.com/Draconiator/Ipema

This gives me pause and has me looking at other solutions for counter-detection.

The GeoCities CSS is going nowhere.

GitHub - Draconiator/Ipema: A script designed to counter the Nepenthes tarpit - designed with the help of A.I. itself.

A script designed to counter the Nepenthes tarpit - designed with the help of A.I. itself. - Draconiator/Ipema

GitHub

@anaiscrosby Running a non-Markov tarpit for half an hour on one public link, and already have Claude lost in my swamp. Waiting to see if it runs into my ZIP bomb

---
216.73.216.124 - - [07/Apr/2026:03:28:49 +0200] "GET /tarpit/until/same/drive/harmattan_leftmost_intranscalency_few_ministries_few_between HTTP/2.0" 200 10132 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])" "-"
---

@anaiscrosby It hit it, but I guess decompressed in a thread. It's a 127M archive that decompresses to 128GB. The bot kept scraping for a bit and then dropped off. Difficult to know if it was a discouragement.

Strange is that soon after other IPs were reaching statistically non-guessable randomly generated URL paths, without touching the webroot or another other tarpit URL prior. They all had iOS UA strings (readily forged).

@JulianOliver @anaiscrosby
I've learnt about poison fountain and zip bomb before but never deployed any. It's interesting to read about your results.
Can I ask why you use Markov chain ? Wouldn't be enough, and simpler, to spit words at random ? Is that because totally random text would be easier to detect as a poison fountain ? Also, have you considered adding images (just coherent noise with irrelevant alt text) to the text ?
About the zip bomb, how do you filter to avoid accidentally bombing an innocent guy ? Checking the user-agent isn't an option, right ? Bad bots forge it anyway.
What about the legal aspect of these two methods, do you know ?

@baillehache_pascal @anaiscrosby

In fact I started with Markov but learned later that since the infamy of Nepenthes (which I tested), some researchers have urged teaching crawlers to detect Markov-generated text, and also look out for text whose words are all in the dictionary (no typos, esp). This is seemingly not difficult to do. And so I moved to a different solution (Pyison), that I then modified to produce a mix of dictionary sourced and random words, with images and in a blog-like format

@baillehache_pascal As for the ZIP bomb, if you click on it in the tarpit I have staging, it will just start a 128MB download for now. That is the worst of it - there's no automatic decompression as gzip''d HTML right now.

As for legality, there's nothing illegal no. They choose to ignore robots.txt also.

On the other hand, it is arguably quite illegal for companies to steal, mine and profit from content without asking, nor with compensation, nor abiding stated licensing terms.

@JulianOliver
> On the other hand, it is arguably quite illegal for companies to steal, ...
I know, and I'm perfectly in line with you on that matter.
@JulianOliver
Thanks for the pointers to nepenthes and pyison. I didn't knew them but they look pretty much like how I would have done it myself...