@simondassow As @ainmosni mentioned, AI scrapers are using proxy services to come from residential IP space when you block their ASN/IP blocks. There are companies like Zscaler that provide access to residential proxies under the guise of legitimacy.
I had to take a layered approach. The robots.txt tells them to go away. If they don't, but they identify themselves, they get fed to iocaine. If they fake their UA (Anthropic and OpenAI do) and come in over HTTP/1*, they get asked to upgrade to HTTP/2+. If they manage that (most do not), then Anubis does a PoW. (I am considering alternatives to Anubis that are behavioral based). Finally, all my sites inject kill string headers into every request.
Unfortunately, I can't find a curated list of AI kill strings and in testing, not every AI platform sees the headers. It looks like Anthropic and OpenAI have a non-AI layer that does fetching over HTTP/1.1 and returns just the body of the request to the agent. If I were designing that layer, I would strip all non-visible content from the body, so I'm not sure adding the fake content into HTML comments will make it through to agents themselves.
I think it would be interesting to extend iocaine to perform prompt injections at that layer. AFAICT, most AI companies _try_ to scrape the site honestly first. If that fails, they gradually add more and obfuscation because they need that sweet, sweet, content.