In which I put my head in the lion’s jaws and write 2700 words about privacy and full-text search on #mastodon: https://www.tbray.org/ongoing/When/202x/2022/12/30/Mastodon-Privacy-and-Search
Private and Public Mastodon

ongoing by Tim Bray
@timbray For plain text search of Mastodon content, won't searching using Google work?

@codeyarns

Not of the admins block it with robots.txt

@timbray @codeyarns not so fast! If my post federates to another instance that doesn't block search with robots.txt, then it 's still likely to wind up in a search engine (even if I've selected the "opt out of search engines" option). I've even found federated versions of deleted posts via search engines.

As you say, Mastodon's privacy story has problems.

The frustrating thing is, local-only toots help a lot here ... but they've been blocked from the main branch since 2017

@jdp23 @codeyarns

If your post carries clear licensing restrictions, there's then a legal club available to beat anyone who misuses it. That's my piece's central point.

@timbray @codeyarns that seems orthogonal to blocking with robots.txt ?
@jdp23 @timbray @codeyarns - if a crawler only indexes local timelines then robots isn't orthogonal - just another layer of choice (which server vs what to set posts to) - I had made an assumption that "opt out of search engines" was indeed that licencing, the point was made is that it's disabled by default, and opt in vs opt out was an interesting part of that conversation as it revolves around active vs passive consent. Not just ethical question, but also a legal one , at least here in the EU.
@jdp23 @timbray @codeyarns - I am the person behind the controversy mentioned in Tims origional blog post, my crawler prototype just used the API to read the local timeline - so it only got root posts, and boosts (afaik), others may do differently
@mc @timbray @codeyarns Ah okay. Well then your crawler behaved differently than Google's search crawler.
@jdp23 @timbray @codeyarns - interestingly local feed here *does* include some replies, but not *all* replies. odd. could be just replies from same instance? But anyway - I also feel the discussions needed are less detail oriented - we can make the details whatever we like, just as long as we are discussing it.
@mc @timbray @codeyarns Yeah. Tim's point about how Mastodon's privacy story is terrible is quite true. Or looking at it differently, the *story* is okay but the *reality* is doesn't match the story.
@jdp23 @timbray @codeyarns if the default privacy model was more 'ethically based with active consent', then the issue there is that the platform wouldn't take off. I feel mastodon has some unique moderation issues to overcome alongside privacy ones. My origional hoster at techhub literally couldn't see root posts I had replied to when someone reported my comments, as the reporter had blocked him too.
@jdp23 @timbray @codeyarns - that would leave only two options, take no action and risk me being a bad actor, or do and risk the reporter being one attempting to harass me. Having investigated the kiwifarms incident I understand why, hence why I proposed limiting it to servers mods and their federated instance mods - i.e. a server could always defed a server if a mod was a bad actor. Didn't get far enough on that topic before my face was chewed off and character defamed in the thread however.
@mc did they ask you to leave or did you just decide to move on your own?
@jdp23 I was kicked, there's a post explaining the reasoning from the techhub admin, apparently running a datacentre with multiple IPs is 'illegal in canada' - (I suspect he was just under a lot of pressure at the time to find a good reason to deplatform me) - but would rather leave that topic, and just try and constructively move on :)
@jdp23 I'm certainly not building it after that, but I AM still interested in the moderation issues we face.
@jdp23 - I will also leave this account active (assuming am allowed), but doubt I'll be around much on social for the foreseeable. The experience was... not great, although I only really have myself to blame for that :)
@mc well, you did say a few times that you'd see it as a good outcome if it sparked discussion and some progress on the issue ... discussion has happened for sure, and it's probably increased the likelihood of timely progress (which doesn't mean it'll happen). We shall see. That said yeah I can see how it wasn't a great experience.

@mc @timbray @codeyarns yeah. i think of it as basically a prototype at scale of a decentralized system, so there are a lot of issues related to decentralized moderation that aren't yet addressed.

It's hard to know about the privacy model in general. For search in particular, opt-in could have worked if carried through the whole code, and I think people would have been okay with it. But that's not how things played out and it seems hard to retrofit.