If it were up to me, Apple would create the world’s first “Do Not Train” registry. Enter your domain or web page, get a TXT record or meta tag to prove ownership, prevent Apple from using your content to train any LLM.

Only a small percentage of sites would do it, I think, so the training impact would be low, but it’d be extremely meaningful for those people. And it’d be a valuable PR tool, brand-booster, and competitor-shamer. (Bonus: make it open and encourage competitors to follow it.)

@cabel I saw an article the other day where someone found one of the "AI" companies ignored robots.txt, faked a standard browser UA (e.g. no "bot" indicator), and used IP masking so they couldn't be filtered. Given the entire industry is built on blatant IP theft I have no expectation any of them would follow it without legislative penalties (and even then if caught I'm sure they'd have a blog saying "whoops we had a minor bug, we've fixed it now")