In which I put my head in the lion’s jaws and write 2700 words about privacy and full-text search on #mastodon: https://www.tbray.org/ongoing/When/202x/2022/12/30/Mastodon-Privacy-and-Search
Private and Public Mastodon

ongoing by Tim Bray
@timbray Excellent—Putting this on this evening’s reading stack.

@timbray I think you're on the right track.

Reminds me of early flickr days where I was creating content and wanted it to be publicly accessible but constrained in how it could be used and really loved how creative commons licensing was part of their publishing workflow. If I could do something similar for my posts here that would be very interesting.

@timbray This lays out the issues really well. I like the idea of instances having their own defaults; with the tightest ones literally not allowing access to anyone outside that server.

@timbray excellent - these are the conversations we should be having about Mastodon. Because frankly, here we can. It's not ultimately decided behind a corporate facade based on A/B testing and engagement metrics.

(I wanted to write something similar about QTs)

@szbalint @timbray I think there’s a big picture set of conversations that will need to happen this year. Search, QTs, algorithms, scale (multiple axes), moderation, and long term sustainability are on the list.

The big challenge is that we have none of the structure it took forever to establish for the open Web, and it’s not clear where those conversations will happen.

@timbray Indeed, an excellent article! Very good job sketching the various positions and Mastodon's current privacy issues. I'll have to think about content licensing as the model but it's certainly an interesting approach.

@szbalint I touch on somewhat similar points related to QTs in https://privacy.thenexus.today/black-twitter-quoting-and-white-toxicity-on-mastodon

Black Twitter, quoting, and white views of toxicity on Mastodon

Does quoting really cause toxicity?

The Nexus Of Privacy

@timbray thoughts on authorized fetch? ( https://docs.joinmastodon.org/admin/config/#authorized_fetch ) It seems like it shares some commonalities with your proposed step one particularly when used in combination with disallow unauthenticated access ( https://docs.joinmastodon.org/admin/config/#disallow_unauthenticated_api_access )

Of course we still have a long way to go with building on top of that for your other suggestions re: federation & data handling contracts but

Configuring your environment - Mastodon documentation

Setting environment variables for your Mastodon installation.

@Satsuma

Hadn't seen that… interesting. But like I said, it's the policy consensus that matters, technology we can fix up.

@timbray oh definitely, people are always the messiest part of the equation! but seems like a decent place to start building off, as it already has moderate adoption on fedi

@timbray Great thoughts. I do not know what the correct answer is, but I always appreciate your writing.

The biggest issue I see here is that because the network is federated, because there's hundreds of admins all over, this decision will have to keep being made by the community over and over.

@timbray I don’t think instance owners will be able to comply with the bureaucracy. It strikes me that privacy focused people should actually be on a different platform.

@BudGibson @timbray

I have questions about the practicalities of various search strategies.

1. How big will the search repository become? Twitter uses two big data centers and still outsources some of its activity to AWS and GCP, if various reports are correct. That is well beyond the capability of instance admins or even instance organizations to manage.

2. What is the time window of search. Right now, AFAIK, the hashtag search basically covers whatever the home instance has seen lately. That is a sort of ephemeral search that answers the question: "what's been said recently in my extended network?"
Twitter keeps everything from inception. That answers the question of "What has ever been said on this topic by anyone anywhere?" Those are two different search motivations and strategies, with huge cost differences.

3. If 3rd parties set up an expansive search engine, how will they pay for it?

If I have misunderstood the mechanics, please put me on the right path.

@vicuzumeri @BudGibson

Sensible questions. But we have to get the policy issues right before we can even start worrying about them.

@timbray @vicuzumeri @BudGibson Tim, have you played around with Pixelfed? It’s not obvious on the app, but posting allows you to set the license and on the browser it shows you exactly what license is associated with a post.

https://gram.social/i/web/post/514906813444551445

jonpainterphoto shared a post

A story of grappa: I visited Venice with some friends right at the end of the season. One of my friends was a regular, and her favorite restaurant set us up for their last night of the year. The food was spectacular, as was the wine. Then the house started pouring grappa. I wandered around empty Venice that night with friends, a camera, a tripod, and a brain full of grappa. This was one of the photos from that night. #travel #Venice #Italy #Night #canal #world

Gram
@jonpainterphoto I was just in Venice a month ago and this takes me right back—that particular color of water at night. It was a full moon; maybe the same when you took that shot.

@timbray @BudGibson

I am confused about where one can look for any policy decision in the Fediverse. If there's no centre of gravity for policy, the next best decision criteria is usually somewhere in the economics.

@vicuzumeri @timbray @BudGibson - I would have purported that the next best decision criteria is probably legal rather than economic, but given that would only happen when someone takes advantage of a system it might be hard to prevent harm using that.
@vicuzumeri @timbray @BudGibson - and if all the legal protections currently in place were applied not convinced anyone would run one. (e.g. the requirement for an impressum containing real name and address when publishing a site that takes money even as donations and can be reached in DACH) - source: https://www.iubenda.com/en/help/7816-impressum-what-is-it-and-when-is-it-needed
Impressum: What is it and when do you need one (example)

What is an Impressum? Is it legally required and what should it contain? Do you need one for Facebook? In this post we answer these questions and more.

iubenda

@mc @timbray @BudGibson

We will have to wait and see how this movie turns out.

FWIW, my expectation is that the Fediverse will evolve differently from previous Internet technology cycles.

To me, the big new factor is the growing maturity of open source technologists and developers around the world.

IMO, the hegemony of the Silicon Valley VC billionaire bros is up for grabs.

Mastodon is the work product of a German developer. The next killer app design iteration may come from Kazakhstan, Mexico or Nigeria.

If this occurs, the legal issues will be more complex ... maybe impossible to enforce.

A kick ass Kazakh server technology could become a huge factor if it were quickly embraced by India, Phillipines and South America. Or any combo like that.

Then who sets legal policy?

Right now, the monolithic SV companies are driven by US law and, increasingly, EU law.

But that could change.

@BudGibson @timbray I feel the same way, and feel somewhat similarly even on a technological front. implementing full text search for an instance is trivial, and I feel it’s inevitable that eventually a large instance will run software that supports full text search, and thus it will naturally have indexed a large amount of federated content.

a policy would have to be overlaid on top of the recognition that once the data is federated to another server, control has been lost.

Tim’s essay provokes a lot of thought, but is limited in scope to current (vanilla) Mastodon; I think that is this discussion’s undoing. much like the retoot argument, Mastodon may be the de facto featureset, but it’s hardly the only implementation.

@BudGibson

It's not clear to me why the desires of people who have been here for years should be dismissed by someone who joined last week.

Perhaps, it's the people who want full search that ought to be on another platform.

@zx I just don’t think you have the protections you think you do. The open web, where we are now, is not a private platform. If you want privacy, you need another platform.

@BudGibson

I know precisely what protections that web can and can not provide. Or do you also think I have access to all your credit cards and passwords?

@timbray coming from the dev world I the licensing idea resonates. We’ve come far enough that BSD / MIT / Apache / CC are a widely understood shorthand .

Then an inner voice pops up; “there’s little chance that the general public could become conversant to the same extent”

And a left-over idealist whispers “but what if they could? I mean, we teach children consent these days…”

@timbray surprising (for me) and interesting read- thanks!
@timbray For plain text search of Mastodon content, won't searching using Google work?

@codeyarns

Not of the admins block it with robots.txt

@timbray @codeyarns not so fast! If my post federates to another instance that doesn't block search with robots.txt, then it 's still likely to wind up in a search engine (even if I've selected the "opt out of search engines" option). I've even found federated versions of deleted posts via search engines.

As you say, Mastodon's privacy story has problems.

The frustrating thing is, local-only toots help a lot here ... but they've been blocked from the main branch since 2017

@jdp23 @codeyarns

If your post carries clear licensing restrictions, there's then a legal club available to beat anyone who misuses it. That's my piece's central point.

@timbray @codeyarns that seems orthogonal to blocking with robots.txt ?

@jdp23 @timbray @codeyarns
Robots.txt is a blunt instrument; rel="" attributes & meta-tags are more precise.

That said I'm not sure they currently have what's needed, either; I expect Tim would know, though, if anybody does.

@jdp23 @timbray @codeyarns
The analogy I would draw is that robot exclusion say who's not welcome & where (which may be 'everyone' & 'everywhere'), & meta tags say what content is off limits.

Again, correction welcome. It's 9 yrs since this has been something I needed to understand for professional reasons.

@jdp23 @timbray @codeyarns - if a crawler only indexes local timelines then robots isn't orthogonal - just another layer of choice (which server vs what to set posts to) - I had made an assumption that "opt out of search engines" was indeed that licencing, the point was made is that it's disabled by default, and opt in vs opt out was an interesting part of that conversation as it revolves around active vs passive consent. Not just ethical question, but also a legal one , at least here in the EU.
@jdp23 @timbray @codeyarns - to be active consent - it would require posts to, by default, be private - which could somewhat stymie growth of the whole platform, and less useful as a tool for helping moderators when they can only see one half the conversation.
@jdp23 @timbray @codeyarns - I am the person behind the controversy mentioned in Tims origional blog post, my crawler prototype just used the API to read the local timeline - so it only got root posts, and boosts (afaik), others may do differently
@mc @timbray @codeyarns Ah okay. Well then your crawler behaved differently than Google's search crawler.
@jdp23 @timbray @codeyarns - interestingly local feed here *does* include some replies, but not *all* replies. odd. could be just replies from same instance? But anyway - I also feel the discussions needed are less detail oriented - we can make the details whatever we like, just as long as we are discussing it.
@mc @timbray @codeyarns Yeah. Tim's point about how Mastodon's privacy story is terrible is quite true. Or looking at it differently, the *story* is okay but the *reality* is doesn't match the story.
@jdp23 @timbray @codeyarns if the default privacy model was more 'ethically based with active consent', then the issue there is that the platform wouldn't take off. I feel mastodon has some unique moderation issues to overcome alongside privacy ones. My origional hoster at techhub literally couldn't see root posts I had replied to when someone reported my comments, as the reporter had blocked him too.
@jdp23 @timbray @codeyarns - that would leave only two options, take no action and risk me being a bad actor, or do and risk the reporter being one attempting to harass me. Having investigated the kiwifarms incident I understand why, hence why I proposed limiting it to servers mods and their federated instance mods - i.e. a server could always defed a server if a mod was a bad actor. Didn't get far enough on that topic before my face was chewed off and character defamed in the thread however.
@mc did they ask you to leave or did you just decide to move on your own?
@jdp23 I was kicked, there's a post explaining the reasoning from the techhub admin, apparently running a datacentre with multiple IPs is 'illegal in canada' - (I suspect he was just under a lot of pressure at the time to find a good reason to deplatform me) - but would rather leave that topic, and just try and constructively move on :)
@jdp23 I'm certainly not building it after that, but I AM still interested in the moderation issues we face.
@jdp23 - I will also leave this account active (assuming am allowed), but doubt I'll be around much on social for the foreseeable. The experience was... not great, although I only really have myself to blame for that :)
@mc well, you did say a few times that you'd see it as a good outcome if it sparked discussion and some progress on the issue ... discussion has happened for sure, and it's probably increased the likelihood of timely progress (which doesn't mean it'll happen). We shall see. That said yeah I can see how it wasn't a great experience.

@mc @timbray @codeyarns yeah. i think of it as basically a prototype at scale of a decentralized system, so there are a lot of issues related to decentralized moderation that aren't yet addressed.

It's hard to know about the privacy model in general. For search in particular, opt-in could have worked if carried through the whole code, and I think people would have been okay with it. But that's not how things played out and it seems hard to retrofit.

Interesting middle-ground there, having posts include an intrinsic license. Although I’m not entirely sure if Creative Commons is a good choice as it currently stands - the non-commercial clause in particular makes entire sets of licenses incompatible with each other, for example. And if your account is on a server whose default license is incompatible with your friend’s, that means being effectively locked out from each other.
@timbray Aside: that outline + tl;drs is brilliant. I'm going to steal that.

@charlesroper

Even better, don't write posts that are so long that you need it.

@timbray - "People should be able to converse without their every word landing on a permanent global un-erasable indexed public record."

I wholeheartedly agree with this, which is why I find it surprising the people who desire that would use a service that packages up those words and fires them off to hundreds of servers beyond their control. There are a multitude of options for folks who want to have that kind of insular community, and Mastodon/ActivityPub is *fundamentally* not that.

@timbray Maybe surprised isn't the word, because I'm not actually surprised most users don't understand what the Fediverse is and react with anger when someone shows them.

The word is probably disappointed. I'm disappointed because I thought the ideas of interoperability, composability, and decentralization were gaining momentum, but they weren't. To most users, Mastodon is just the new "app"

@timbray Good essay. The bit about having lots of content licensing options seems confusing. Maybe if there are just 2 choices, "private" or "public" or "free" vs "commercial"... but any more would be a nightmare.

Also, the weird sentence stands out: "Missing the point...": I have never seen anyone miss that point from any side of this debate.

@timbray Thanks for writing this. I don't know what the right solution is, but it is nice to have folks listen to the privacy worries that marginalised communities have expressed about full text search rather than dismissing them

@timbray Commented directly, awaiting moderation.

As an aside on commenting, I know you rolled your system a while back but there’s a lovely @w3c Recommendation called Webmention that would’ve let me post a reply on my site immediately while awaiting moderation to appear on yours: https://indieweb.org/Webmention

It’s like WordPress-style pingbacks but better spec’d and with room for things like a WoT-style antispam extension to shift the burden of trustworthiness to the sender: https://indieweb.org/Vouch#Shift_Burden_To_Sender

Webmention

Webmention is an open web standard (W3C Recommendation) for conversations and interactions across the web, a powerful building block used for a growing distributed network of peer-to-peer comments, likes, reposts, and other responses across the web.

IndieWeb

@timbray @jnm re: “The Fediverse needs to get its content-licensing shit together.”

This is actually a nice @pixelfed feature. You can set a license for each post. Screenshot attached.

I’m not sure what enforcement there is, if any. It’s fairly buried with no obvious way to set a default. It doesn’t appear in the main post view.

I still use it every time 😃

Here’s a PixelFed post I added a license to - can you find it? https://pixelfed.social/p/peterbronez/514067269392443567

peterbronez shared a post

A very #Midjourney Christmas, the B Side. #AIart

pixelfed

@timbray @jnm @pixelfed related:

Megaface was a facial recognition dataset created from CC licensed Flickr images. It was ultimately decommissioned due to licensing objections: https://exposing.ai/megaface/

Discovered via this HN discussion https://news.ycombinator.com/item?id=34213036

#fedilytics

Exposing.ai: MegaFace

MegaFace is a dataset of over 4 million faces used benchmarking and developing face recognition technologies

Exposing.ai
@timbray When I worked at Twitter on misinformation, and harassment, and civic integrity, I proposed a middle ground I didn’t see mentioned in the blog post, but I think would be a solution that lives up to Mastodon’s principles of user control, but also makes search much more useful for all. https://macaw.social/@mergesort/109606214767896961
Joe Fabisevich :verified: (@[email protected])

Attached: 2 images While we’re talking about interesting ideas for approaching harassment, rather than forcing users to use hashtags to opt into search I’d love for Mastodon to implement an idea to I proposed at Twitter. People should be able to opt in and out of having their public posts be searchable on an as needed basis, no hashtags needed. https://fabisevi.ch/2022/04/01/goodbye-fellow-tweeps/#fnref-1

Macaw-Social
@timbray given the Fediverse momentum, search will be solved for all public feeds because a majority of users will want it, irrespective of any orthodoxy of the current establishment. Content is public or private to a controllable scope. The latter might end up being e2e encrypted yet distributed interested groups using some periodic key rolling/distribution scheme for members.