Hi everybody, I’m ready to unveil my year-end-holiday-hack project:

Meet Searchtodon: ***Private*** Timeline Search for Mastodon

It fills a gap that I have been missing over on Twitter as well: “I remember seeing this THING, where was that again?”

It is built with privacy and consent in mind (pls see the FAQ), but is also *an experiment* to see if something like this is accepted by the larger Mastodon community.

Here goes: https://searchtodon.social

Ex-Searchtodon: Private Timeline Search for Mastodon

As promised, here’s an update on Searchtodon.

I have shut it down & deleted all data (as of 14:06 CET today).

As implemented, it does not gel *with* the Mastodon community, although the functionality did prove useful to a lot of people.

I’m working on a retrospective that hopefully can inform future experimenters.

Thanks everybody for giving it a try and for all your constructive feedback!

@janl Noooo, I want to use it :<
@mackuba sorry :/
@janl I wish there was a way to tell how many % of the general Mastodon user base are against this - I have a feeling this is a small but very loud minority… 😒
@mackuba @janl I still think, it would get much warmer welcome if it's not an opt-out, but an opt-in. Like if you want to use it just put a #search or #index in your bio instead of adding a #nosearch or #noindex I would have less data, yes, but if a large amount of users want to use it, they will opt-in anyway.

I don't really care about it, sure I prefer not to be indexed, that's why a post is local and not global in that case, but I know that's my personal preference and even if I don't care, I still think services like that should be opt-in and not opt-out, like if you don't know it exists you can't opt-out even if you would do that if you know, on the other hand, if you don't know you don't use it, but if you know and want to use you can opt-in.

And really that can make a huge difference in how the community receive the message. I agree if there is a user demand (does not matter how large is it) the service has place in the world, and the host can decide if they want to continue running a service depending on how many users are using it and with an opt-in logic that can be measured easily (with opt-out a "2million users" has no value as maybe 99% of them didn't even know they are using it)

@efertone @janl The problem is, if I can only search through 10% of the posts I've seen in my timeline, such feature is practically useless to me… it only makes sense if I can search all of them or a vast majority.

One solution could be having this standardized, as some kind of opt-out setting on the profile (which is *off* by default) and exposing it as a field in the Toot JSON record in all timelines. Then such tools could easily do e.g. toots.filter(t => t.author.dontIndex == false).

@mackuba @janl Well, i don't think you can go through and implement such a feature on all softwares (mastodon, misskey, pleroma, etc), I think that point It would be much easier to just implement a private timeline search toggle for search on mastodon and no posts are indexed elsewhere and no extra stress on servers while the crawler is running. And I'm very confident, a lot of the members of the community don't like external indexing and search-ability on their content without their control and if they don't know it exists, they can't opt-out.

Again, i would allow it server wide if I know my users has state clear consent for that, otherwise I would have to put a block on it (ip, user-agent). Worth to note, I think if it's not a well known thing and ppl later realize it was an opt-out thing, the IP would get a lot of reports and firewall block. I mean did you see how quick some groups can gather and collectively practice/abuse the power of mass on the internet?
🤣

Misskey has a pretty good full test search with elasticsearch (soon will be replaced), maybe that's why I don't feel the need of an external service. At least I always found what I wanted to find when i remembered only a few keywords and around when it was posted (a few days, 1-2 weeks, etc).
@efertone @janl Wait, but does Misskey let you search through posts in your home timeline (= from people you are following) for some period of time back, all seen posts, not only favorited ones?
@mackuba @janl Search will search in all posts, all non-dm posts basically as far back as it has the index / in the database. Obviously if an admin removed old posts for some reasons (save resources), or the (remote or local) user deleted the post and the deletion was federated, they will not be visible as they are not in the database.
@efertone @janl So… if I understand it correctly, this is exactly what we were talking about - not a public search engine, just being able to locally search what your timeline has already downloaded previously. And I'm being told I'd be an asshole if I wanted to implement something like this… 😕 Or did I misunderstand it, are we talking about different things?
@mackuba @janl The big difference is the method, while it's integrated within misskey (or any other server softwares), it will follow the full activepub protocol with all actions, for example if someone blocks someone else, or deletes a post. The content is not pulled form other servers, their server is pushing the content to that server. If someone follows a user, the user will be notified about it and can act (block) and as it's part of the protocol, user can set their profile to "follow only if I allow" which means even if I try to follow the user, i'll not get their post unless their approved it.

So yes, does similar thing, but that's why I said I don't mind, if it's not a "pull everything unless they said no". Not the idea is"bad", the logic has flaws. When a user sees and even approve my follow request, they see my user, my profile, and my domain and they can say "nope" and refuse to push any content to this server. If someone deletes a post it will be deleted from this server too while a crawler can do that it's not trivial and pretty resource-heavy (checking all existing posts if they are still there). If a user block a whole instance, their content can be seen on other instances, but they intentionally blocked one instance or even a server has a full block. Yes that content is still accessible from other instances, but it's not "collected/exposed" on a central location.

I'm not sure I could describe it well enough
😪 Sorry I'm really tired after spending 4 hours in IKEA 😞
@efertone @janl Yeah, so I guess there are some nuances, but what I was thinking about wouldn't be very different - it's just that instead of the search database running on your local server and indexing what's pushed to it, it would be running on your computer and indexing the exact same data, pulled through the home timeline API of your server. It would also still only index things from people you can follow, because that's where the contents of your home timeline comes from.
@efertone @janl One difference would be the deleted posts since the deletion would probably not propagate to the local cache on the local computer… but I guess this could probably be fixed somehow.
@mackuba @janl I may or may not suggest to approach it from a different angle. Build a desktop client and add a cache layer and an option to search in your cache too. That way it's not something that "pulls and indexes" content, it's a desktop client that has cache and "timeline specific search" feature. I don't say that would not have issues, but it sounds much friendlier 😅

I wonder if Whalebird, Hyper-space, or any other clients have this feature already.
@efertone @janl Yes, that's precisely what I was thinking about :)