Almost 29 million requests from AI crawlers defeated by essentially one simple check: if the user agent contains Chrome/ or Firefox/, and doesn't have sec-fetch-mode, it's going into the maze.

Billions of dollars poured into AI, yet, their crawlers are broken by two ifs in an nginx config.

If this all wasn't so sad, I'd laugh.

Post by iocaine powder, @[email protected]

#iocaine has been up for 4days 15h 5m 28s, and spent 1day 6h 26m 31s dealing with - *gestures hands wildly* - [everything](https://monitor.madhouse-project.o…

come-from.mad-scientist.club
@algernon

Hiya,
I am curious to see what the "maze" looks like. Is there a way I (a human) can preview it?

@2something https://poison.madhouse-project.org/

Feel free to look around! There are QR codes and fake jpegs, and a bunch of other fun stuff :)

Lie quiet.

Fun with.

@algernon Thank you!

Do all the QRs contain only plain text or are some of them links?

Oh wow I can ignore the links and type anything in the URL for a page.

@2something All QR codes are text. And yes, any and all URLs on that host will generate some kind of garbage :)

You can even directly go to a .jpg or .png, or .svg URL, or .css and .js too!

Though, the css currently has no randomized content, and the js is only minimally randomized.

I sometimes end up playing with the jpg urls, see if I can find something fun.

For example, https://poison.madhouse-project.org/@[email protected] looks like a landscape, if I squint hard enough!

By the way, every URL will render the same content (for that URL) until I change the initial random seed on the server side and restart the software. Adds a bit of flair to it, with the content changing every once in a while. :)

Query strings influence the randomness too! https://poison.madhouse-project.org/@[email protected]?q=100 for example is different than without the ?q=100. And as with the url, the query strings can be anything too.

(And these are extremely cheap to generate on the fly: apart from the QR codes, everything else is rendered faster than I can read a non-cached file from a btrfs filesystem on SSD.)