Bluesky To Sell Your Content To AI Data Miners

So it begins. Hidden in Jay Graber's recent charm offensive is this innocuously framed initiative: Bluesky is weighing a proposal that gives users consent over how their data is used for AI (https://techcrunch.com/2025/03/10/bluesky-is-weighing-a-proposal-that-gives-users-consent-over-how-their-data-is-used-for-ai/)

Not so fast.

1) Shows they are planning on doing content deals with AI companies.
2) Seems like it is Opt-out vs. Opt-in (see below).
3) It is just a voluntary robots.txt file

h/t @Lydie https://tech.lgbt/@Lydie/114149023344861046

more...

#Bluesky

Bluesky is weighing a proposal that gives users consent over how their data is used for AI | TechCrunch

Speaking at the SXSW conference in Austin on Monday, Bluesky CEO Jay Graber said the social network has been working on a framework for user consent over

TechCrunch

@mastodonmigration @Lydie It should be opt-in, not opt-out. But "It is just a voluntary robots.txt file" is all the Fediverse has to defend against bots that sweep up the content of public posts (on sites that don't use authorized fetch, which is most of them).

See https://lwn.net/Articles/1008897/ to see what sites that are trying to do the right thing are up against. Sites that have extensive archives are being hammered by AI scrapers that ignore robots.txt and disguise themselves to defeat blocking.

Fighting the AI scraperbot scourge

There are many challenges involved with running a web site like LWN. Some of them, such as fin [...]

LWN.net

@not2b @Lydie

Point is that she is inviting them in the door. Yes they can always ignore the robots.txt, but it is better if they are not 'allowed' on the platform at all. And a really good question is will Bluesky still be paid for the content data scapers who ignore robots.txt hoover up? This opens a Pandora's box.

Another question... Will Bluesky be paid for Fedi content the data scapers hoover up from Bluesky?

@mastodonmigration @not2b @Lydie feels like Bluesky is in a no-win situation with you

if they give users the ability to flag how their data should be treated by AI scrapers -> "not good enough, those companies shouldn't be allowed on the platform at all"

but if Bluesky somehow built a mechanism to prevent AI companies from accessing user data -> "see! I told you it was centralized this whole time! they control everything!"

atproto uses the model of the web which is permission-less data transfer

the User Intents proposal is trying to come up with a general framework for letting users affirmatively declare how they want their data handled by AI scrapers, yes, but also orgs like archive.org or Bridgy Fed since people may have different preferences for different organizations

@edavis @not2b @Lydie

Not true at all. Simply state clearly that AI data mining of Bluesky user content is not authorized. No need for anything more. That would be a win win.