@codepo8 in fact the right mecanism to refuse data mining is through TDM:
https://www.w3.org/community/reports/tdmrep/CG-FINAL-tdmrep-20240510/
But just like Meta, ByteDance officially refused to follow robot.txt, they also refused to follow TDM. Because those solutions are both "good faith from scrapers" solution, nothing can in fact stop them and we are screwed
This specification defines a simple and practical Web protocol, capable of expressing the reservation of rights relative to text & data mining (TDM) applied to lawfully accessible Web content, and to ease the discovery of TDM licensing policies associated with such content.
@adisonverlice AI scrapers are drastically increasing the costs of hosting things on the Internet by generating astronomical bills for site administrators. This antisocial behavior has driven many admins into the arms of Cloudflare, a solution that ruins websites for people on older devices, more limited devices, or who need assistive technology (such as screen readers). If AI scrapers don’t stop, it soon won’t matter if people have their stuff on the Internet or not because other people won’t be able to get to it. Those people who can’t afford either the attacks from AI scrapers or the “solutions” to fight them won’t be able to share their stuff on the Internet anyway.
@codepo8 then how about we test that theory?
i'm gonna put up (for test purposes) a robots.txt file and i'll DM you the results. and if it's wrong, i'm gonna call you out on it. funny story bro, but that's not ow robots.txt works.
and i'm gonna do an experiment to show this is true since you clearly think it is.
but read this
"The instructions in robots.txt files cannot enforce crawler behavior to your site; it's up to the crawler to obey them. While Googlebot and other respectable web crawlers obey the instructions in a robots.txt file, other crawlers might not."
I am very concerned !
@nlnet -
an EU funded institution did share this joke without explaining it !!!
cc @YlvaJohansson @EUCommission