This thoughtful dive into the legal/ethical/technical mess that is Mastodon search is so good.

"Stated in the most general possible way: The Fediverse needs to get its content-licensing shit together."

by @timbray, who knows a thing or two because he's seen a thing or two

https://www.tbray.org/ongoing/When/202x/2022/12/30/Mastodon-Privacy-and-Search

Private and Public Mastodon

ongoing by Tim Bray

@edbott @timbray

Great, now I am inspired to write an app that connects to #fediverse relay servers and indexes everything.

Honestly, I don't even think it would be that difficult.

@taco @edbott @timbray

It wouldn't be but you'd be so hated so fast...

@raynor @edbott @timbray Everyone loves tacos!

I don't think I'd actually do it - not enough hours in the day.

You better believe that someone is going to, though.

It's going to get even worse as large companies start to spin up their own ActivityPub instances, join relays, and then quietly relay everything to an indexing service.

I'm not saying they would, but Mozilla could easily do this and no one would ever even know.

@taco @edbott @timbray

There are already conversations on a #Fediverse #searchengine . The last one on #Mastodon didn't go well: the user is/was suspended so the link doesn't work anymore.

If you want an idea on how things stand, see this conversation on Github about searching a single _instance_

https://github.com/glitch-soc/mastodon/pull/1502

You also need to consider GPDR, CCPA etc, etc. You are in so many jurisdictions at once you will violate laws no matter what.

Add SEARCH_ALL_PUBLIC_STATUSES env flag by VyrCossont · Pull Request #1502 · glitch-soc/mastodon

Context: https://docs.joinmastodon.org/user/network/#search Vanilla Mastodon intentionally refuses to search outside a user's own toots, favs, bookmarks, and mentions. This flag makes that restrict...

GitHub

@raynor @edbott @timbray I was reading a bit of the drama around search and I'm in the second camp - search is useful and it's going to happen eventually. The only choice we have is if it's going to be implemented by the community or if it's going to be implemented by a company looking for a product.

I think for any large scale fediverse search engine, you'd have to ensure two things

1) You are indexing data that is publicly available. Friends only or DMs or anything of that sort should be off-limits.
2) You are not storing any PII/SPI - Keep your index to the ActivityPub stream content for public posts - see #1

The goal would be to link back to the source content, not be a service for aggregation.

You'd also need a solid privacy policy and a method for things like GDPR/DCMA takedown requests.

It's not trivial, but it's not impossible.

@taco @raynor @edbott @timbray The ActivityPub architecture seems well-suited to policy-mediated indexing; there is already a way to retrieve an instance’s “policies” via the API. But there is no standardized, machine-readable vocabulary for expressing Fediverse policies — about harvesting, about privacy. We need an ActivityPub Policy Vocabulary/Ontology
Machine Interpretable Privacy Policies -- A fresh take on P3P

@taco @raynor @edbott @timbray …and of TAMI/“Policy-Aware Web” http://dig.csail.mit.edu/TAMI/
Transparent Accountable Datamining Initiative (TAMI)

@raynor @taco @edbott @timbray FYI, Google already offers full-text search of all public posts on almost any Mastodon instance -- type, say, "site:mastodon.social biden" into the search bar (without quotes), and you get what you asked for, in abundance.

Which leaves me less than impressed with anyone, particularly self-described community elders, who seems to believe that 1) restrictions in Mastodon itself prevent this, and 2) it's somehow important to keep it that way.

@edbott @timbray

I get the ideals, but outlawing hammers b/c some people can use them for nefarious purposes isn't going to work.

Searching is absolutely fundamental to using the internet. Building walls to prevent access is sorta the antithesis of the 'open' protocol

@edbott
@timbray

there is no way to stop indexing or archiving or data-mining on a public, unecrypted system full-stop. Ideally, if there was an auto-pgp option on posts, with the keys handed out to just followers, following or server members depending on settings you could actually get some protection.

EDIT:after this @edbott sent me a message saying "Bye boy.", deleted his replies to my points and blocked me: mature journalistic behaviour 🫠.