If it were up to me, Apple would create the world’s first “Do Not Train” registry. Enter your domain or web page, get a TXT record or meta tag to prove ownership, prevent Apple from using your content to train any LLM.

Only a small percentage of sites would do it, I think, so the training impact would be low, but it’d be extremely meaningful for those people. And it’d be a valuable PR tool, brand-booster, and competitor-shamer. (Bonus: make it open and encourage competitors to follow it.)

@cabel (a) why would this be different than then saying they’d honor a robots.txt?

(b) this sounds like a do not track thing, but without even the attempts at legislation around it (which themselves failed, and using the header at all is pointless)

@jason Somewhat cynically, with my Apple hat on: much stronger PR. “Apple follows robots.txt” is a weak headline and robots.txt are notoriously loosely and optionally-followed so trust is very low. Apple is all about building trust but that can’t be built on a shaky platform. New initiatives get attention.
@cabel @jason I hate to be a jerk here, but it sounds like your primary concern is Apple's PR look, and that respecting people's rights is a side effect.
@tankgrrl @jason why not both dot jpeg
@cabel @tankgrrl @jason FWIW I took it as “Apple should do this for good reasons and in case anybody needs convincing: good PR.”
@danielpunkass @cabel @jason I'd still prefer a PR look like "AI company refuses to steal peoples' content without their permission" over "we won't steal your stuff if you tell us not to and sign up for our registry"
@danielpunkass @cabel @jason On a related note but not part of this question, none of what these companies are doing qualifies as fair use, yet no one seems to want to argue for the content creators' rights. Everyone just throws up their hands like "Well, what are you going to do, You shouldn't have put it on the internet."