If it were up to me, Apple would create the world’s first “Do Not Train” registry. Enter your domain or web page, get a TXT record or meta tag to prove ownership, prevent Apple from using your content to train any LLM.

Only a small percentage of sites would do it, I think, so the training impact would be low, but it’d be extremely meaningful for those people. And it’d be a valuable PR tool, brand-booster, and competitor-shamer. (Bonus: make it open and encourage competitors to follow it.)

@cabel (a) why would this be different than then saying they’d honor a robots.txt?

(b) this sounds like a do not track thing, but without even the attempts at legislation around it (which themselves failed, and using the header at all is pointless)

@jason Somewhat cynically, with my Apple hat on: much stronger PR. “Apple follows robots.txt” is a weak headline and robots.txt are notoriously loosely and optionally-followed so trust is very low. Apple is all about building trust but that can’t be built on a shaky platform. New initiatives get attention.
@cabel @jason I like the sentiment, but functionally it seems no different? The largest “AI” companies already have ways of blocking their crawlers via robots.txt and as you say this would also be “opt in” and unenforceable. If any company were to do this, I would critique it as an incredibly empty gesture.
@hank @jason I think “robots.txt” has a very bad reputation for being voluntary and often outright ignored. It’s almost not even worth doing one because there’s zero accountability. It’s time to start over. The idea here is to increase trust and accountability. But it would also be quintessentially Apple to reinvent something that has long existed but making it a little bit better and nicer. Look i’m just wearing my Apple pretend executive hat here! 😛

@cabel As somebody who runs a service that does at lot of user-driven web crawling for archival purposes, I'm familiar with exactly how loose it all is...

I just don't think an opt-in list is the way to create this accountability. I would prefer it be settled with case-law that says "if you train on copyright protected work you'll get mega-sued" but large companies are trying to get ahead of that by making cash deals with anyone who can afford lawyers right now. :(

@cabel For the same reason, I'm mad that the EU didn't legislate "do not track" requests (a similarly honour-based system) into law, instead we have a million consent popups to individually agree to or a button that says "do whatever"

@cabel @hank right, this would entirely benefit Apple’s brand and require extra work for one specific vendor.

Frankly if they did this and opened it up to third party AI providers, I would be extremely tempted to sign up just to wantonly violate it and drag down the reputation it might carry.

Yes, I’m that petty, but maybe the only computer company I’ve ever used (save one gaming PC) in nearly four decades should stop acting in ways that have me looking for the exits.

@cabel @hank and yes I think archive.org should disregard robots whenever they want no I will not be taking questions 😬
@jason We also make web archiving software and have a similar stance 🙃
@cabel @jason I hate to be a jerk here, but it sounds like your primary concern is Apple's PR look, and that respecting people's rights is a side effect.
@tankgrrl @jason why not both dot jpeg
@cabel @tankgrrl @jason FWIW I took it as “Apple should do this for good reasons and in case anybody needs convincing: good PR.”
@danielpunkass @cabel @jason I'd still prefer a PR look like "AI company refuses to steal peoples' content without their permission" over "we won't steal your stuff if you tell us not to and sign up for our registry"
@danielpunkass @cabel @jason On a related note but not part of this question, none of what these companies are doing qualifies as fair use, yet no one seems to want to argue for the content creators' rights. Everyone just throws up their hands like "Well, what are you going to do, You shouldn't have put it on the internet."