Mastodawn

Aaron Jan 14, 2025

*KICKS DOOR DOWN*

Hey everyone! Hate AI web crawlers? Have some spare CPU cycles you want to use to punish them?

Meet Nepenthes!

https://zadzmo.org/code/nepenthes

This little guy runs nicely on low power hardware, and generates an infinite maze of what appear to be static files with no exit links. Web crawlers will merrily hop right in and just .... get stuck in there! Optional randomized delay to waste their time and conserve your CPU, optional markovbabble to poison large language models.

Nepenthes - ZADZMO.org

Making web crawlers eat shit since 2023

Show thread

sb arms & legs Jan 14, 2025

@aaron
This sounds like a lot of fun! I have a 56 core blade sitting around here somewhere...

Show thread

Aaron Jan 14, 2025

@sb Oh let's. Fucking. GO! Sadly it can't saturate more than one CPU yet.

Yet.

....

I just posted a list of other projects I want to make headway on this year, and bam, now I see I missed one. Was halfway through something that'd easily max out that blade!

Show thread

sb arms & legs Jan 15, 2025

@aaron
I've just be working with multithreaded http requests in #python. Alas my #lua is limited or I'd offer to help make it so.

It would really be fun to set that thing up with a handful of VMs, each running some of these projects and collect some data.

Show thread

Aaron

@sb Nepenthes already aggregates statistics of IP and User-agent info, and it is indeed interesting to pick through.

Google is the only crawler smart enough to escape - but it keeps coming back eventually.

Facebook is the only one that seems to use IPv6.

There's a lot of shadow crawlers, with fake Chrome user agents to remain hidden, mostly in China.