A hacker developed an "infinite maze" to trap web-crawlers/scrapers from AI companies

basically, if the server code detects that a web crawler from an AI firm is trying to scrape the site ...

... the code begins spinning up an infinite, nesting warren of new sham pages, filled with random text

so the crawler gets stuck crawling and scraping endless and meaningless pages

fun @jasonkoebler piece at @404mediaco

https://www.404media.co/email/7a39d947-4a4a-42bc-bbcf-3379f112c999/?ref=daily-stories-newsletter

Developer Creates Infinite Maze That Traps AI Training Bots

"Nepenthes generates random links that always point back to itself - the crawler downloads those new links. Nepenthes happily just returns more and more lists of links pointing back to itself."

404 Media
@clive what a waste from both sides

@gagliardi_vale

yep, I think that's basically the point of it

@clive @gagliardi_vale
Job creation for data annotators in India, Nigeria, Vietnam, etc who have been given the microtasks of removing any junk like this from the AI training data.