Mastodawn

Julian Oliver Apr 4

Working on some poison-as-a-service (PaaS). Looking to launch in the next few days.

#AI #enjoythinking

Show thread

Julian Oliver Apr 5

Also working on a zip bomb, to randomly scatter in among the links.

Thanks to @anaiscrosby I came across this excellent method, using LZ77:

https://natechoe.dev/blog/2025-08-04.html

TBH I was just going to `dd if=/dev/urandom` my way to a titanic RAM flooding *.gz, but am getting great results with the above, and with bonus site data honey inside to keep bots on the chase.

natechoe.dev - A googol byte zip bomb that's also valid HTML

Show thread

Julian Oliver Apr 5

@anaiscrosby After seeing ChatGPTBot blow 123 seconds on my drip-feed poison tarpit and then never come back, I got reading on how modern LLM scrapers might employ mechanisms to detect tarpits and blacklist.

During research I came across this tarpit evading scraper that provides some interesting insights into how modern LLM scrapers might do this.

https://github.com/Draconiator/Ipema

This gives me pause and has me looking at other solutions for counter-detection.

The GeoCities CSS is going nowhere.

GitHub - Draconiator/Ipema: A script designed to counter the Nepenthes tarpit - designed with the help of A.I. itself.

A script designed to counter the Nepenthes tarpit - designed with the help of A.I. itself. - Draconiator/Ipema

GitHub

Show thread

Julian Oliver 5d ago

@anaiscrosby Running a non-Markov tarpit for half an hour on one public link, and already have Claude lost in my swamp. Waiting to see if it runs into my ZIP bomb

---
216.73.216.124 - - [07/Apr/2026:03:28:49 +0200] "GET /tarpit/until/same/drive/harmattan_leftmost_intranscalency_few_ministries_few_between HTTP/2.0" 200 10132 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])" "-"
---

Show thread

Julian Oliver 5d ago

@anaiscrosby It hit it, but I guess decompressed in a thread. It's a 127M archive that decompresses to 128GB. The bot kept scraping for a bit and then dropped off. Difficult to know if it was a discouragement.

Strange is that soon after other IPs were reaching statistically non-guessable randomly generated URL paths, without touching the webroot or another other tarpit URL prior. They all had iOS UA strings (readily forged).

Show thread

Baillehache Pascal 5d ago

@JulianOliver @anaiscrosby
I've learnt about poison fountain and zip bomb before but never deployed any. It's interesting to read about your results.
Can I ask why you use Markov chain ? Wouldn't be enough, and simpler, to spit words at random ? Is that because totally random text would be easier to detect as a poison fountain ? Also, have you considered adding images (just coherent noise with irrelevant alt text) to the text ?
About the zip bomb, how do you filter to avoid accidentally bombing an innocent guy ? Checking the user-agent isn't an option, right ? Bad bots forge it anyway.
What about the legal aspect of these two methods, do you know ?

Show thread

Julian Oliver 5d ago

@baillehache_pascal @anaiscrosby

In fact I started with Markov but learned later that since the infamy of Nepenthes (which I tested), some researchers have urged teaching crawlers to detect Markov-generated text, and also look out for text whose words are all in the dictionary (no typos, esp). This is seemingly not difficult to do. And so I moved to a different solution (Pyison), that I then modified to produce a mix of dictionary sourced and random words, with images and in a blog-like format

Show thread

Baillehache Pascal

@JulianOliver
Thanks for the pointers to nepenthes and pyison. I didn't knew them but they look pretty much like how I would have done it myself...