Bluesky To Sell Your Content To AI Data Miners

So it begins. Hidden in Jay Graber's recent charm offensive is this innocuously framed initiative: Bluesky is weighing a proposal that gives users consent over how their data is used for AI (https://techcrunch.com/2025/03/10/bluesky-is-weighing-a-proposal-that-gives-users-consent-over-how-their-data-is-used-for-ai/)

Not so fast.

1) Shows they are planning on doing content deals with AI companies.
2) Seems like it is Opt-out vs. Opt-in (see below).
3) It is just a voluntary robots.txt file

h/t @Lydie https://tech.lgbt/@Lydie/114149023344861046

more...

#Bluesky

Bluesky is weighing a proposal that gives users consent over how their data is used for AI | TechCrunch

Speaking at the SXSW conference in Austin on Monday, Bluesky CEO Jay Graber said the social network has been working on a framework for user consent over

TechCrunch
@mastodonmigration @Lydie Nah, sorry, you got that completely backwards here… This proposal is likely at least partially a response to occasional dramas when someone takes the public data from network (which anyone can technically do) and does something with it that some users don't like, since it's unclear exactly what you're allowed to do with it and what you aren't. So this would be a way to let users specify their intention of how this openly accessible data is allowed to be used.

@mastodonmigration @Lydie By default it would be unspecified, so "we don't know", so exactly like now.

"Intent preferences would be tri-state: explicitly allow, explicitly disallow, or undefined"

"… three states makes it clearer when a user has made an explicit decision or not. Realistically, a large majority of users may stick with the default "undeclared" state. In that situation, downstream projects will need to make their own policy decisions around whether content re-use is acceptable"

@mackuba @Lydie

So Kuba, you are saying Bluesky will not sell your content to data scapers? Can you show us where it says that?

@mastodonmigration @Lydie https://bsky.app/profile/bsky.app/post/3layuzbto2c2x

None of this is about what *Bluesky* will be able to do, it's about what *anyone* is allowed to do with your public data. So if you check "AI = enabled" for example, this means you allow *anyone* to read your posts from the firehose and use them to train models. Which means Bluesky can't sell it, because it's already available for free then, so nobody would pay for it extra.

Bluesky (@bsky.app)

A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data. We do not use any of your content to train generative AI, and have no intention of doing so.

Bluesky Social
@mastodonmigration @Lydie They generally don't have many options here around selling data that's all already public from the start, even if they wanted to.

@mackuba @Lydie

Simply not true. The data is not public. It is published under the explicit privacy policy terms. The distinction between what people can do and what they are allowed to do legally matters.

Again, are you saying that Bluesky will not sell user content to data scapers, and where do they assure their users of this? Simple question.

@mastodonmigration @mackuba @Lydie the literal first point of the BlueSky privacy policy is, and I quote, "Profiles and posts are public". I agree that we should be watching them but this whole thread feels like a hit piece interpreting work under the worst possible intentions.

https://bsky.social/about/support/privacy-policy#profile-posts-public

Privacy Policy - Bluesky

Bluesky

@McNeely @mackuba @Lydie

Would be curious what other intentions you would ascribe to a change in the software to give specific authority to have your content scraped by AI data miners?

When Twitter/X did this last year it triggered outrage.

@mastodonmigration @mackuba @Lydie right now there's no authority granted one way or the other. What is being proposed is a method to grant or deny that authority.

I think a useful analogy is a code base with no declared license. The code is technically copyright of the publisher, but being publicly available on the internet its dependent on others respecting that implied right (& we know it won't be). I think this is a proposal to create a very simplified licensing scheme.

@McNeely @mackuba @Lydie

Hmmm... Not exactly following you. Are you saying that there is clamor among Bluesky users to explicitly grant consent their content to be scraped by AI data miners and this change in the software is to respond to this desire? Does that actually make any sense to you?

@mastodonmigration @McNeely @Lydie More like, there is a desire to make it easier to express explicit denial of consent, which this proposal would help with. And on the other hand, people building or using e.g. bridges like Bridgy, wish that it would be easier to opt in to things like that for users who want to. And having some subset of users who explicitly opted in to e.g. using content for AI models would kind of make it more clear that the rest didn't.

@mackuba @McNeely @Lydie

This is a canard. There would be no need to express explicit denial of consent individually if consent were generally explicitly denied.

And you are postulating that with all the other things Bluesky has on its development road map, providing some small bunch of users who want their data used for AI training is what is being prioritized. Does that really make sense to you?

@mastodonmigration @McNeely @Lydie They're trying to build a more general protocol, and they're adding various things to the protocol that they feel are needed for it to be more useful.
@mastodonmigration @mackuba @Lydie I'm saying that the clamor to have control and options is being interpreted repeatedly as proof that something nefarious is afoot. No one wants to attach the GPL to their skeets they just want control.

@McNeely @mackuba @Lydie

What people want is for their content to not be used for AI training. Full stop. It would be easy for Bluesky to clearly assert this general prohibition. No one is clamoring to let AI data miners mine their content.

@mastodonmigration @McNeely @Lydie Some folks on Bluesky are pointing out that there have been various conversations before on the Fediverse too about having some kind of way to express consent on how you allow your content to be used, and that various misunderstandings come from the fact that there is no such way to express consent formally yet. So this is what Bluesky is trying to add in ATProto.

@mackuba @McNeely @Lydie

Hmmmm... not buying it. Why not just explain that then? Why lead at a high profile symposium with AI scraping? Where are all the Bluesky users clamoring for AI data scraping?

@mastodonmigration @McNeely @Lydie Twitter just unilaterally changed the rules so they can use the content for various things. Bluesky is thinking about adding a way to express if you allow or don't allow your content to be used for A, B, C, D. These are not the same thing.

@mackuba @Lydie

Thank you for this clarification. But actually it just muddies the water more.

"In that situation, downstream projects will need to make their own policy decisions around whether content re-use is acceptable"

Who are these "downstream projects" that she is teeing up?

@mastodonmigration @Lydie Meaning anyone who builds something using the API, which is permissionless so anyone can connect to it at any moment and start saving some data. People building various apps, tools, services that somehow make use of the data (not by making some kind of deal with Bluesky PBC, but by just opening a code editor and writing some code that makes requests to api.bsky.app or bsky.network or PDS servers and downloading some JSON and doing stuff with it).
@mastodonmigration @Lydie So for example my website https://blue.mackuba.eu/stats/, where I download all posts and then once a day run a query "select count(*) from posts where ..." and save the result as another row and then draw a chart from that, is an example of a "downstream project".
Bluesky Stats

Bluesky daily/weekly activity statistics charts

@mackuba @Lydie

Understand that there are desirable uses, and the way these documents are written they love to give innocuous examples. The problem is that this type of presentation is misleading in that the policy change also permits the undesirable uses.

Again, simple question, are you saying that these changes do not presage a plan by Bluesky to sell or otherwise profit off sharing user content with AI data scapers? And, can you show us where they assure their users of this?

@mastodonmigration @Lydie Yes, I'm sure this proposal has nothing to do with what Bluesky is able to do, just with what anyone using the API is allowed to do. And the proposed default in this doc is what we have right now.

I can't find any place where they explicitly say they will not sell data to other companies, but this has been their general stance that they don't intend to do stuff like that, and like I said, all data being publicly accessible kinda makes that not a very valuable resource.

@mackuba @Lydie

Well it would certainly be nice if they would come out in no uncertain language and affirm this. And it would seem that the time when you are announcing a fairly big prospective change to user permissions would be the time to do so. Let's stay tuned.

@mastodonmigration @Lydie Btw, here's how a Bluesky protocol dev announced this proposal:

@mackuba @Lydie

Thanks. Good to know. Don't see how it changes anything, but helpful to see how it is being presented.