Bluesky To Sell Your Content To AI Data Miners

So it begins. Hidden in Jay Graber's recent charm offensive is this innocuously framed initiative: Bluesky is weighing a proposal that gives users consent over how their data is used for AI (https://techcrunch.com/2025/03/10/bluesky-is-weighing-a-proposal-that-gives-users-consent-over-how-their-data-is-used-for-ai/)

Not so fast.

1) Shows they are planning on doing content deals with AI companies.
2) Seems like it is Opt-out vs. Opt-in (see below).
3) It is just a voluntary robots.txt file

h/t @Lydie https://tech.lgbt/@Lydie/114149023344861046

more...

#Bluesky

Bluesky is weighing a proposal that gives users consent over how their data is used for AI | TechCrunch

Speaking at the SXSW conference in Austin on Monday, Bluesky CEO Jay Graber said the social network has been working on a framework for user consent over

TechCrunch

@Lydie

Let's get into this. #Bluesky is bleeding money and selling your data is the best way they have of "monetizing" you. So why not frame it as a "voluntary" initiative?

Thing is, seems like it will be opt-out. See this github 'proposal': https://github.com/bluesky-social/proposals/tree/main/0008-user-intents

"Suppose a Bluesky user does not want any of their public data to be used for generative AI training. They would go in to app settings, find the data reuse preferences section, and configure “Generative AI” to “disallow”.

more...

proposals/0008-user-intents at main · bluesky-social/proposals

Bluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.

GitHub

@Lydie

Opt-out vs. Opt-in is a crucial thing. 90% of users never change the default setting.

Oh, and when we dig into the github doc, it is not just AI.

"The initial categories described here include:
generative AI
protocol bridging
bulk datasets
public archiving and preservation"

What are these "bulk datasets" that Bluesky would be selling?

Just more 'distributed' Bluesky trickery. And if you don't like it, Jay Graber says you can "fork off": https://mastodon.online/@mastodonmigration/114140940924285320

Mastodon Migration (@[email protected])

Now Bluesky's Jay Graber is simply lying. “If a billionaire came in and bought Bluesky and took it over, or I decided tomorrow to change things in a way that people didn’t really like, then they could fork off and go on to other applications.” https://observer.com/2025/03/bluesky-ceo-jay-graber-wants-world-without-caesars/ Of course, this makes absolutely no sense, but tech media dutifully just prints it. What does "fork off and go on to other applications" mean? What is it she is even saying? Getting really tired of this gaslighting BS.

Mastodon

@Lydie

Let's just try one more thought exercise. Let's say Eugen Rochko told tech media that all content on mastodon.social was going to be sold to AI scapers and "bulk dataset" brokers, but you had the ability to "opt-out" by checking a box that would insert the robots.txt header.

Can you imagine?

And yet, Jay Graber's announcement flies under the radar. This is what happens when you've constructed a cult of personality around your enshittification. Enough of the gaslighting.

#Bluesky

@Lydie

In the comments below some defenders of this #Bluesky opt-out AI scraping change say something like, "It just gives users more control over their content." Like it is a good thing. This is baloney.

You don't need to put a sign on your car saying it is not okay to break into this car. It's your car.

This "control of your own data" argument is nonsense. You have control of your own data, it's yours. All you can be tricked into doing is giving it away.

@mastodonmigration @Lydie One of the best ways to fuck wirh AI is to set up a bot that will be obvious to and only to humans, and feed the AI output back to the input.

Enough of this will give the AI the electronic equivalent of an LSD trip. Add war footage for an electronix K-hole (NOT the same as a pi-hole!) or other bad trip.

@mastodonmigration @Lydie As effective as taking a shower in public and placing "please respect my privacy - do not look" sign.

But frankly, using services that inherently require resources and labour to be provided for free, without having an idea how they are being funded, requires either a considerable level of naïvity or utter indifference.

@mastodonmigration @Lydie

If Eugen Rochko did the same as #Bluesky and put the "Social." data up for sale, the users would move to another instance and the value of social would be 0 euros... Thanks for that #fediverse.

@mastodonmigration @Lydie It's all explained here below - so bulk datasets is about whether you allow sites like archive.org etc. to store your public posts forever (which they technically can now, but are not sure if they're allowed to):

@mackuba @Lydie

Sorry, but this doesn't seem to clarify anything.

@mastodonmigration @mackuba @Lydie yo creo que se veía venir ...hay muchos usuarios muy muy sospechosos que están allí a la espera de que algo así pase ...que raro ...ver algunos usuarios muy interesados en el Fediverso pero que aparentemente y solo aparentemente digo tuvieran más información de hacia donde se dirige la plataforma ...llevo hace poco allí y ya se siente un tufillo a "lo mismo".
@mastodonmigration @Lydie Also, opt-out means they have a window where they sell all your data before you can get to the screen to opt-out

@semiotic_pirate @Lydie

Interesting thought. Though if you look at the specifics of the proposal, seems possible that they plan on selling your data anyway and trusting the "downstream" recipient to honor the intent of the user.

https://github.com/bluesky-social/proposals/tree/main/0008-user-intents

proposals/0008-user-intents at main · bluesky-social/proposals

Bluesky proposal discussions. Contribute to bluesky-social/proposals development by creating an account on GitHub.

GitHub
@mastodonmigration @Lydie Like being on the Do Not Call list.

@Lydie

To be more specific... (Thanks @mackuba)

"Intent preferences would be tri-state: explicitly allow, explicitly disallow, or undefined"

"Realistically, a large majority of users may stick with the default "undeclared" state. In that situation, downstream projects will need to make their own policy decisions around whether content re-use is acceptable"

Which seems even worse, as is signals intent and absolution of any responsibility.

@mastodonmigration

hm, I have a bridge to bluesky, so I guess I have to do the same setting on my bridged account too?

@Lydie

@di0v0n @Lydie

Great point. If Bluesky does a deal with AI data scrapers they are going to hoover up all the fedi bridged content too no doubt.

@mastodonmigration @di0v0n @Lydie came here to say similar. As enabling the bridge doesn't give us an actual bsky account, I don't see where there could even be an opt-out option that we would be able to access.
@mastodonmigration @Lydie How would I opt out of having my Mastodon posts used for training AI? Is there even any way to know if someone has set up an instance, followed thousands of people, and is feeding all the posts into an AI?

Mastodon doesn’t allow you to opt out of your data used to train AI (although some instances have clauses in their terms to prohibit it). In fact if anybody from Threads is following you - or following somebody who boosts one of your posts - Threads privacy policy says your data can be used to train Meta’s AI and target ads.

So if Bluesky implements this, they’ll be providing more control than Mastodon does today. How people are getting from there to “they’re going to sell your data!!!!” is mysterious to me. Sure, opt-in would be better but Mastodon doesn’t whether have opt out!

@dogzilla @mastodonmigration @Lydie

@thenexusofprivacy @dogzilla @Lydie

Mastodon is defacto opt-out of your data being used to train AI, because no such rights are explicitly granted. The authorized uses are enumerated in the instance privacy policy and they do not include AI scaping.

Agree with you about the Threads problem and wrote about it extensively at the time.

Yes, selling content is an inference. What do you think the plan is, to simply give it away to AI scrapers? Not sure this would be better, and it makes no sense.

Yes I think Bluesky’s plan is very much to make it easy for AI scrapers and everybody else to access public dsta for free. They’ve said so repeatedly, their architecture is optimized for it, it fits in with their belief system, and they have plenty of other ways of making money. Of course they could change their minds, but adding robots.txt-like consent signals doesn’t matter that any easier or more likely.

As for the situation on Mastodon, I’m not sure what privacy lawyer told you that and how much time they had spent looking at your instance’s privacy policy, but you might want to get some other expert opinions before giving that advice to others.

@mastodonmigration @dogzilla @Lydie

@dogzilla @mastodonmigration @Lydie AI companies already do this illegally.

People have already posted screenshots of ChatGPT summarizing the content of other people's fediverse accounts.

@hisham_hm @dogzilla @Lydie

Of course they do. The issue under discussion is whether the platform gives them the authority to do so or not.

@mastodonmigration @hisham_hm @dogzilla @Lydie If we know it is done regardless of the permission to do so, then discussing which platforms allows it or not becomes a bit useless.

Taking a great stance against AI scrapping to no effect and having a deal with AI scrapper leads to the same consequences. We should look into that.

@mastodonmigration @dogzilla @Lydie Yes, of course. I'm definitely not saying "well, all is lost because they'll scrape your data anyway". Legal accountability is no joke.

@hisham_hm @mastodonmigration @Lydie Well, how many admins or users have a legal department to draw on? In theory that would restrain corporations or hackers, in practice it really doesn’t. It’s probably a line item in the business plan.

So I’m not sure that in practice BlueSky is that different from Masto, at least for this issue. I’m sure there’s plenty of other metrics

For me, I’ll never rely on a centrally-controlled presence again, but I’ll visit

@mastodonmigration
Respect your choice. Heheh, sigh 😞

@mastodonmigration @Lydie Under european data protection laws it must be opt out at least for european users - otherwise they open themself up to potentially catastrophique fines. (20M, up to 4% of global gross revenue)

So let’s see if they want to spend the time to implement a per country default.

LinkedIn did the same half a year ago but it was opt out in the USA, where it was an inventivized opt in for Europeans.

@mastodonmigration @Lydie
These days it does not feel good to be in the position of giving a lot of people an "I told you so"
@mastodonmigration @Lydie I'm personally not an Ai hater. I'm not even against AI being trained with stuff online. What I'm against is the shitty way these companies sneak around and dodge questions about what they're doing, and the obscene amounts of money they waste and/or make. Or using confidential info to train LLMs, because we all know they are. AI doesn't need to go away, capitalism does.

@mastodonmigration @Lydie It should be opt-in, not opt-out. But "It is just a voluntary robots.txt file" is all the Fediverse has to defend against bots that sweep up the content of public posts (on sites that don't use authorized fetch, which is most of them).

See https://lwn.net/Articles/1008897/ to see what sites that are trying to do the right thing are up against. Sites that have extensive archives are being hammered by AI scrapers that ignore robots.txt and disguise themselves to defeat blocking.

Fighting the AI scraperbot scourge

There are many challenges involved with running a web site like LWN. Some of them, such as fin [...]

LWN.net

@not2b @Lydie

Point is that she is inviting them in the door. Yes they can always ignore the robots.txt, but it is better if they are not 'allowed' on the platform at all. And a really good question is will Bluesky still be paid for the content data scapers who ignore robots.txt hoover up? This opens a Pandora's box.

Another question... Will Bluesky be paid for Fedi content the data scapers hoover up from Bluesky?

@mastodonmigration @Lydie The difficulty here is that most of the people running AI scrapers are downright crooks, people who make OpenAI look ethical. They use botnets (stolen devices) to do the scraping. Since most of the prominent people on BlueSky post publicly, meaning you can get their postings without logging in or having an account on BlueSky, they are already hoovering it up. They disguise themselves as ordinary users. You can't block them. They are doing the same for any Fediverse site they can gain access to. Again, read the LWN article.

@not2b @Lydie

Understood. If you post on social media you should expect your content to be scraped. Again, the difference is giving people (and perhaps profiting from) giving people the specific legal authority to do so.

@mastodonmigration @not2b @Lydie There is no implication or suggestion anywhere in the proposal that Bluesky would be paid for anything by anyone, this is something you've added yourself.

@mackuba @not2b @Lydie

Yes, that is the inference. What are you suggesting? That the plan is to simply give user content to AI data scrapers for free? Don't see how that would be better, and in any case it makes no sense.

But, this can all be resolved very simply by Bluesky clarifying the matter in specific terms. Jay Graber announcing this change to facilitate AI scraping does the opposite.

@mastodonmigration @not2b @Lydie If users decide so, yes. You can't really sell something that isn't secret at all.

@mackuba @not2b @Lydie

Sure you can. People sell books all the time.

@mastodonmigration @mackuba @not2b @Lydie wait what are these "books" it sounds familiar

@mastodonmigration @not2b @Lydie feels like Bluesky is in a no-win situation with you

if they give users the ability to flag how their data should be treated by AI scrapers -> "not good enough, those companies shouldn't be allowed on the platform at all"

but if Bluesky somehow built a mechanism to prevent AI companies from accessing user data -> "see! I told you it was centralized this whole time! they control everything!"

atproto uses the model of the web which is permission-less data transfer

the User Intents proposal is trying to come up with a general framework for letting users affirmatively declare how they want their data handled by AI scrapers, yes, but also orgs like archive.org or Bridgy Fed since people may have different preferences for different organizations

@edavis @not2b @Lydie

Not true at all. Simply state clearly that AI data mining of Bluesky user content is not authorized. No need for anything more. That would be a win win.

@mastodonmigration @Lydie Nah, sorry, you got that completely backwards here… This proposal is likely at least partially a response to occasional dramas when someone takes the public data from network (which anyone can technically do) and does something with it that some users don't like, since it's unclear exactly what you're allowed to do with it and what you aren't. So this would be a way to let users specify their intention of how this openly accessible data is allowed to be used.

@mastodonmigration @Lydie By default it would be unspecified, so "we don't know", so exactly like now.

"Intent preferences would be tri-state: explicitly allow, explicitly disallow, or undefined"

"… three states makes it clearer when a user has made an explicit decision or not. Realistically, a large majority of users may stick with the default "undeclared" state. In that situation, downstream projects will need to make their own policy decisions around whether content re-use is acceptable"

@mackuba @Lydie

So Kuba, you are saying Bluesky will not sell your content to data scapers? Can you show us where it says that?

@mastodonmigration @Lydie https://bsky.app/profile/bsky.app/post/3layuzbto2c2x

None of this is about what *Bluesky* will be able to do, it's about what *anyone* is allowed to do with your public data. So if you check "AI = enabled" for example, this means you allow *anyone* to read your posts from the firehose and use them to train models. Which means Bluesky can't sell it, because it's already available for free then, so nobody would pay for it extra.

Bluesky (@bsky.app)

A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data. We do not use any of your content to train generative AI, and have no intention of doing so.

Bluesky Social
@mastodonmigration @Lydie They generally don't have many options here around selling data that's all already public from the start, even if they wanted to.

@mackuba @Lydie

Simply not true. The data is not public. It is published under the explicit privacy policy terms. The distinction between what people can do and what they are allowed to do legally matters.

Again, are you saying that Bluesky will not sell user content to data scapers, and where do they assure their users of this? Simple question.

@mastodonmigration @mackuba @Lydie the literal first point of the BlueSky privacy policy is, and I quote, "Profiles and posts are public". I agree that we should be watching them but this whole thread feels like a hit piece interpreting work under the worst possible intentions.

https://bsky.social/about/support/privacy-policy#profile-posts-public

Privacy Policy - Bluesky

Bluesky

@McNeely @mackuba @Lydie

Would be curious what other intentions you would ascribe to a change in the software to give specific authority to have your content scraped by AI data miners?

When Twitter/X did this last year it triggered outrage.

@mastodonmigration @mackuba @Lydie right now there's no authority granted one way or the other. What is being proposed is a method to grant or deny that authority.

I think a useful analogy is a code base with no declared license. The code is technically copyright of the publisher, but being publicly available on the internet its dependent on others respecting that implied right (& we know it won't be). I think this is a proposal to create a very simplified licensing scheme.

@McNeely @mackuba @Lydie

Hmmm... Not exactly following you. Are you saying that there is clamor among Bluesky users to explicitly grant consent their content to be scraped by AI data miners and this change in the software is to respond to this desire? Does that actually make any sense to you?

@mastodonmigration @McNeely @Lydie More like, there is a desire to make it easier to express explicit denial of consent, which this proposal would help with. And on the other hand, people building or using e.g. bridges like Bridgy, wish that it would be easier to opt in to things like that for users who want to. And having some subset of users who explicitly opted in to e.g. using content for AI models would kind of make it more clear that the rest didn't.

@mackuba @McNeely @Lydie

This is a canard. There would be no need to express explicit denial of consent individually if consent were generally explicitly denied.

And you are postulating that with all the other things Bluesky has on its development road map, providing some small bunch of users who want their data used for AI training is what is being prioritized. Does that really make sense to you?

@mastodonmigration @McNeely @Lydie They're trying to build a more general protocol, and they're adding various things to the protocol that they feel are needed for it to be more useful.
@mastodonmigration @mackuba @Lydie I'm saying that the clamor to have control and options is being interpreted repeatedly as proof that something nefarious is afoot. No one wants to attach the GPL to their skeets they just want control.

@McNeely @mackuba @Lydie

What people want is for their content to not be used for AI training. Full stop. It would be easy for Bluesky to clearly assert this general prohibition. No one is clamoring to let AI data miners mine their content.

@mastodonmigration @McNeely @Lydie Some folks on Bluesky are pointing out that there have been various conversations before on the Fediverse too about having some kind of way to express consent on how you allow your content to be used, and that various misunderstandings come from the fact that there is no such way to express consent formally yet. So this is what Bluesky is trying to add in ATProto.