Mastodawn

Aaron Jan 14, 2025

*KICKS DOOR DOWN*

Hey everyone! Hate AI web crawlers? Have some spare CPU cycles you want to use to punish them?

Meet Nepenthes!

https://zadzmo.org/code/nepenthes

This little guy runs nicely on low power hardware, and generates an infinite maze of what appear to be static files with no exit links. Web crawlers will merrily hop right in and just .... get stuck in there! Optional randomized delay to waste their time and conserve your CPU, optional markovbabble to poison large language models.

Nepenthes - ZADZMO.org

Making web crawlers eat shit since 2023

Show thread

Maoulkavien Jan 15, 2025

@aaron a simple robots.txt might help differentiate between search engine crawlers and LLM crawlers, the latter often not even bothering reading said file.

So it might be possible to let robots know there is nothing worth reading here, and let robots that don't care get lost indefinitely :)

Show thread

Aaron Jan 15, 2025

@Maoulkavien Google and Microsoft both run search engines - most alternative search engines are ultimately just front ends for Bing - and both are investing heavily in AI if not outright training their own models. There is absolutely nothing preventing Google from putting it's search corpus into the LLM, in fact it's significantly more efficient than crawling the web twice.

Which is why, top of the project's web page, I place a clear warning that this WILL tank your search results.

Or sure, you could use robots.txt to give a warning to one of the biggest AI players where you placed your defensive minefield. Up to you.

Show thread

Maoulkavien

@aaron Yeah that makes sense. Just sayin' there could be a slightly less aggressive approach that would not tank search results and punish only those not following standard implementations for how crawlers should behave.

This could be deployed alongside a "real" running website which would still tank/poison many LLMs in the long run.

Thanks for the tool though, I'll try and find some time to deploy it somewhere of mine 👍