Thinking about how to enable full-text searching of a useful subset of Fediverse posts in a way that is controlled by post authors and has humane unsurprising defaults.

The current yes/no binary toggle under Settings/Other (that most people have never seen) probably isn't going to do the job.

The problem needs a long-form write-up (coming), and is controversial, but I’m pretty sure won’t go away, so the community should get in front of it.

@timbray This sounds to me like the social graph interoperability debate and how classic web ideas clash with modern ideas around privacy.

Regardless of how useful it is to others, people have the right to opt out of their data being used. The GDPR effectively bans social graph interoperability between services given this principle. Mastodon seems to ignore this which is itself interesting.

The same philosophy should apply to being able to search people’s content.

@carnage4life

>> people have the right to opt out of their data being used.

Agreed. I need to make sure I’m clear on that.

>> The GDPR effectively bans social graph interoperability between services

Huh? More please?

@timbray

“Third condition: the right to data portability shall not adversely affect the rights and freedoms of others

The third condition is intended to avoid the retrieval and transmission of data containing the personal data of other (non-consenting) data subjects to a new data controller in cases where these data are likely to be processed in a way that would adversely affect the rights and freedoms of the other data subjects (Article 20(4) of the GDPR).”

@timbray You can find context on the above excerpt in this Stratechery article at https://stratechery.com/2017/the-gdpr-and-facebook-and-google-intelligent-tracking-prevention-data-portability-and-social-graphs/ which references this PDF on the guidelines to rights to data portability https://iapp.org/media/pdf/resource_center/WP29-2017-04-data-portability-guidance.pdf

I’m also familiar with both the EU & FTC’s perspectives on these data portability principles via my day job at Meta.

I’m not sure ability to opt out of your content being searched is as codified but the principle feels very related.

The GDPR and Facebook and Google, Intelligent Tracking Prevention, Data Portability and Social Graphs

The GDPR will hurt Google and Facebook; it will hurt their competitors far more, which means the position of the two biggest digital ad companies will actually be strengthened. Then, why data porta…

Stratechery by Ben Thompson
@timbray @carnage4life third condition here https://iapp.org/media/pdf/resource_center/WP29-2017-04-data-portability-guidance.pdf it’s pretty muddy, but the takeaway appears to be that you can bring over attributes of third parties automatically in a social graph, but in order to use them (particularly if building entities into a graph, but I think one could argue that creating an search index is infringing use of the data) then you need the third parties’ permission.
@carnage4life @timbray Do these options (in Mastodon user settings) address the social graph concerns you mentioned, or is there something else about social graph interoperability that I’m not understanding?

@timbray @ramsey The social graph principles are framed in the context of services (e.g. Twitter sharing social graph data with LinkedIn or vice versa via some friend finding API).

I’m not a lawyer so can’t comment on whether my Mastodon instance having an API that by default lets anyone query my friend list without mentioning it when I signed up violates or is in compliance with the principle. If a social media app/service did this then it definitely wouldn’t be.

@carnage4life @timbray
If we take "authors right to control" to the logical extreme, we'll find a need to include an Access Constraint or Rights Grant in each ActivityPub message.

For web pages, we've long used a robots.txt to control *who* may crawl a site and we've used page-specific"noindex" metatags, to control *what* may be done with crawled content. Does the SocialWeb need more granular control?

robots.txt: https://developers.google.com/search/docs/crawling-indexing/robots/intro
noindex: https://developers.google.com/search/docs/crawling-indexing/block-indexing

Robots.txt Introduction and Guide | Google Search Central  |  Documentation  |  Google for Developers

Robots.txt is used to manage crawler traffic. Explore this robots.txt introduction guide to learn what robot.txt files are and how to use them.

Google for Developers
@bobwyman @timbray It’s not really an extreme. I can choose permissions ranging from Only Me to Friends to Public on Facebook. So this is a fairly mainstream and normalized privacy construct.
@carnage4life @timbray
If I send a message to "Friends," some of whom use instances that implement full text search, I'd like my Friends, but only my Friends, to find my messages using their instance's search system. This seems to imply that the search engine must filter results based on a combination of my constraints plus knowledge of who is viewing the results. To make this possible, my distribution constraints must be kept with each message.

@carnage4life @timbray
robots.txt and noindex only control one link in a potentially long chain of events. Should authors have control throughout the entire chain?

Example: For a dating app, a heterosexual woman may wish her profile to be indexed, but wishes that only men between given ages are able to view the indexed data. A bi-sexual woman may wish to control viewer's age but not their sex. In these cases, a "noindex" tag is too coarse.

Is there a boundary after which rights are exhausted?

@bobwyman @carnage4life

Let’s not go to extremes. What's the simplest thing that could possibly meet the needs I've heard expressed by Mastodon participants?

Agreed that, eventually, it makes perfect sense for these settings to accompany ActivityPub messages.

@[email protected] Looking forward to your longer thoughts on this. Personally I find that Mastodon posts are often overloaded with hashtags, hurting readability. I assume at least some of that user behavior comes from how search works.
@manton @timbray Yes, among the first bits of advice I got when I dove back into Mastodon recently was "use hastags liberally so people can find your posts." I have always hated hashtags (on all platforms) but I've also been trying to conform to the cultural norms rather than bringing my own baggage.
@sharding @manton Bear in mind that it's perfectly OK not to want people to be able to find your posts.
@timbray @manton Yes, of course. I've primarily done it when it's something that I hope people outside of my normal group will happen across (e.g. technical questions). BTW, I'm not saying I think this is the _right_ practice — just affirming Manton's hypothesis.
@manton @timbray I like tags and their respective streams, but find them very unnatural to use or read inline. Tags are metadata and should have their own field.
@timbray honestly, between the global search engine opt-out and the per-post visibility settings, all the pieces for user controlled search are already in place.
@fraying I don't think so? I'd like options to include no-search/hashtag-only/full-text, and also time-limits.

@timbray Fair. No search exists (the global opt out) hashtag only is what we have now (so would be a good default), and time limits can be had by setting the posts to auto-delete.

Anyway, I agree with you that search should be more useful, and if masto doesn’t include it the outside crawlers will.

@fraying @timbray FWIW, there have been a few attempts at scrapers over the years. The attempts tend to be rather rude and get blocked by a big chunk of servers. That tends to stymie the usefulness of the index. Not impossible but I think the old ask-forgiveness-later approach doesn't go as far on the fediverse as on the wider web
@lmorchard @timbray oh I’m sure there have. And I love that the fediverse is taking these things seriously.
@timbray (I was the designer for Technorati back in the day, if you remember that beautiful moment before Google started indexing blogs, so I’ve seen firsthand how search can help build community.)
@fraying @timbray I loved those early days, and I loved Technorati!
@fraying @timbray Goodness me I'd not thought about Technorati for ages
@timbray @fraying By “time-limits” do you mean the automated post deletion #Mastodon preferences, or something else?
@mjgardner @fraying I mean time limits. Obvs if there is scheduled deletion, that raises interesting questions and corner cases.
@timbray @fraying Distributed cache invalidation inevitably does
@timbray @fraying I think that request caters to a rather small sub-set of users, but I might be wrong.
@david @fraying I agree but I think it's an important subset. Some people really don't want to be found, for excellent reasons.
@timbray @fraying I think time limitation is the key. People often treat these posts as ephemera, but the tech persists them despite that. My 2c would be a 30-day expiry default that can be overriden with an archival function of some sort.
@timbray @fraying that seems sort of complicated

@nelson @fraying

Don't think the implementation would be that hard, but coming up with a humane UX would be.

@timbray
@cwebber, were machine-readable terms ever considered in-scope for #ActivityPub ?
@timbray I bumped into this conondrum myself when I posited that great search would be good. Clear people are looking for better ways that what we had in the past to control indexing of their posts. Not to speak of the question of ownership and copyright. It feels like there is an opportunity to solve this well in mastodon/fediverse.
@timbray Is there an existing client that makes it easy to just search for posts I've seen before?
@timbray thoughts about resurrecting FOAF? I thought about it as a way to have decoupled social networks