Proof-of-work challenges have become the current hotness for defeating AI scrapers. I think it’s great we have these and that they’re getting deployed to great effect. But I’ve also seen a lot of people claim the “AI scrapers” problem is now solved and I’m sorry to tell you this but no it’s not.
The reason it’s solved right now is because most of these scrapers don’t execute JavaScript. But with enough people deploying PoW proxies, the economics around that change enough to make it worthwhile for AI companies to do so. AI companies have more money than you. Yes it’ll cost them, but that cost is worth it to them because otherwise they don’t have a business.
(Also Anubis and other solutions default to only triggering if the User-Agent header contains Mozilla so guess what! It’ll soon need to be enabled regardless of the value of that header because it’s trivial to circumvent. Then the cost goes up for the operator too as more and more users get affected.)
The JS needed for the PoW stuff isn’t complicated. A small JS interpreter can handle that. What mostly remains is then the cost of the hash. Right now most things use SHA256, for which we have CPU extensions and AVX instructions to speed this up. Constantly increasing the PoW rounds doesn’t solve this. Eventually the experience degrades too much for real users, whereas servers literally don’t care. Nobody is sitting there waiting for the output to be rendered. All they want is to get the content to train on.
PoW proxies are a stopgap, and a very useful one. But a stopgap nonetheless. We’re buying ourselves time. But we’re going to need more than this. Including legislation that outlaws some of this shit entirely.
AI is a technology, but the root of the problem we’re facing is a societal and political one. We cannot ignore those aspects and exclude them from a solution.