AI companies are violating a basic social contract of the web and and ignoring robots.txt
AI companies are violating a basic social contract of the web and and ignoring robots.txt
Put something in robots.txt that isn't supposed to be hit and is hard to hit by non-robots. Log and ban all IPs that hit it.
Imperfect, but can't think of a better solution.
If it doesn’t get queried that’s the fault of the webscraper. You don’t need JS built into the robots.txt file either. Just add some line like:
here-there-be-dragons.htmlAny client that hits that page (and maybe doesn’t pass a captcha check) gets banned. Or even better, they get a long stream of nonsense.