Mastodawn

Clive Thompson Jan 23

A hacker developed an "infinite maze" to trap web-crawlers/scrapers from AI companies

basically, if the server code detects that a web crawler from an AI firm is trying to scrape the site ...

... the code begins spinning up an infinite, nesting warren of new sham pages, filled with random text

so the crawler gets stuck crawling and scraping endless and meaningless pages

fun @jasonkoebler piece at @404mediaco

https://www.404media.co/email/7a39d947-4a4a-42bc-bbcf-3379f112c999/?ref=daily-stories-newsletter

Developer Creates Infinite Maze That Traps AI Training Bots

"Nepenthes generates random links that always point back to itself - the crawler downloads those new links. Nepenthes happily just returns more and more lists of links pointing back to itself."

404 Media

Show thread

2xfo Jan 23

@clive @jasonkoebler @404mediaco

I've seen stories about people hosting sites that got hit by robots and they had to pay a bunch of money in data costs. I wonder how this works, if it can help in that regard when the whole point is to keep them pointed at your site.

I'm all for wasting their time, i just wonder how much it costs.

Show thread

Lord Tom Klopf of CZ

Jan 23

@RnDanger @clive @jasonkoebler @404mediaco yeah, you’d have to host this on a service that doesn’t charge by network traffic

Show thread

OCTADE

@RnDanger@infosec.exchange @clive@saturation.social @jasonkoebler@mastodon.social @404mediaco@mastodon.social

Employ bandwidth throttling at about 16K with a few hundred thousand link trees to follow. That will really teach them and save your bandwidth bill.

Show thread

Luna chan Jan 23

@octade @thomas_klopf @RnDanger @clive @jasonkoebler @404mediaco Even better.