Proof-of-work challenges have become the current hotness for defeating AI scrapers. I think itโs great we have these and that theyโre getting deployed to great effect. But Iโve also seen a lot of people claim the โAI scrapersโ problem is now solved and Iโm sorry to tell you this but no itโs not.
The reason itโs solved right now is because most of these scrapers donโt execute JavaScript. But with enough people deploying PoW proxies, the economics around that change enough to make it worthwhile for AI companies to do so. AI companies have more money than you. Yes itโll cost them, but that cost is worth it to them because otherwise they donโt have a business.
(Also Anubis and other solutions default to only triggering if the User-Agent header contains Mozilla so guess what! Itโll soon need to be enabled regardless of the value of that header because itโs trivial to circumvent. Then the cost goes up for the operator too as more and more users get affected.)
The JS needed for the PoW stuff isnโt complicated. A small JS interpreter can handle that. What mostly remains is then the cost of the hash. Right now most things use SHA256, for which we have CPU extensions and AVX instructions to speed this up. Constantly increasing the PoW rounds doesnโt solve this. Eventually the experience degrades too much for real users, whereas servers literally donโt care. Nobody is sitting there waiting for the output to be rendered. All they want is to get the content to train on.
PoW proxies are a stopgap, and a very useful one. But a stopgap nonetheless. Weโre buying ourselves time. But weโre going to need more than this. Including legislation that outlaws some of this shit entirely.
AI is a technology, but the root of the problem weโre facing is a societal and political one. We cannot ignore those aspects and exclude them from a solution.