Hi everybody, I’m ready to unveil my year-end-holiday-hack project:

Meet Searchtodon: ***Private*** Timeline Search for Mastodon

It fills a gap that I have been missing over on Twitter as well: “I remember seeing this THING, where was that again?”

It is built with privacy and consent in mind (pls see the FAQ), but is also *an experiment* to see if something like this is accepted by the larger Mastodon community.

Here goes: https://searchtodon.social

Ex-Searchtodon: Private Timeline Search for Mastodon

As promised, here’s an update on Searchtodon.

I have shut it down & deleted all data (as of 14:06 CET today).

As implemented, it does not gel *with* the Mastodon community, although the functionality did prove useful to a lot of people.

I’m working on a retrospective that hopefully can inform future experimenters.

Thanks everybody for giving it a try and for all your constructive feedback!

@janl wait, you weren't selling the collected lewd shitposts to facegoog for big €€€?
@janl Noooo, I want to use it :<
@mackuba sorry :/
@janl I wish there was a way to tell how many % of the general Mastodon user base are against this - I have a feeling this is a small but very loud minority… 😒
@mackuba @janl Exactly my feeling as well.
@mackuba @janl I still think, it would get much warmer welcome if it's not an opt-out, but an opt-in. Like if you want to use it just put a #search or #index in your bio instead of adding a #nosearch or #noindex I would have less data, yes, but if a large amount of users want to use it, they will opt-in anyway.

I don't really care about it, sure I prefer not to be indexed, that's why a post is local and not global in that case, but I know that's my personal preference and even if I don't care, I still think services like that should be opt-in and not opt-out, like if you don't know it exists you can't opt-out even if you would do that if you know, on the other hand, if you don't know you don't use it, but if you know and want to use you can opt-in.

And really that can make a huge difference in how the community receive the message. I agree if there is a user demand (does not matter how large is it) the service has place in the world, and the host can decide if they want to continue running a service depending on how many users are using it and with an opt-in logic that can be measured easily (with opt-out a "2million users" has no value as maybe 99% of them didn't even know they are using it)

@efertone @janl The problem is, if I can only search through 10% of the posts I've seen in my timeline, such feature is practically useless to me… it only makes sense if I can search all of them or a vast majority.

One solution could be having this standardized, as some kind of opt-out setting on the profile (which is *off* by default) and exposing it as a field in the Toot JSON record in all timelines. Then such tools could easily do e.g. toots.filter(t => t.author.dontIndex == false).

@mackuba @janl Well, i don't think you can go through and implement such a feature on all softwares (mastodon, misskey, pleroma, etc), I think that point It would be much easier to just implement a private timeline search toggle for search on mastodon and no posts are indexed elsewhere and no extra stress on servers while the crawler is running. And I'm very confident, a lot of the members of the community don't like external indexing and search-ability on their content without their control and if they don't know it exists, they can't opt-out.

Again, i would allow it server wide if I know my users has state clear consent for that, otherwise I would have to put a block on it (ip, user-agent). Worth to note, I think if it's not a well known thing and ppl later realize it was an opt-out thing, the IP would get a lot of reports and firewall block. I mean did you see how quick some groups can gather and collectively practice/abuse the power of mass on the internet?
🤣

Misskey has a pretty good full test search with elasticsearch (soon will be replaced), maybe that's why I don't feel the need of an external service. At least I always found what I wanted to find when i remembered only a few keywords and around when it was posted (a few days, 1-2 weeks, etc).
@efertone @janl Wait, but does Misskey let you search through posts in your home timeline (= from people you are following) for some period of time back, all seen posts, not only favorited ones?
@mackuba @janl Search will search in all posts, all non-dm posts basically as far back as it has the index / in the database. Obviously if an admin removed old posts for some reasons (save resources), or the (remote or local) user deleted the post and the deletion was federated, they will not be visible as they are not in the database.
@efertone @janl So… if I understand it correctly, this is exactly what we were talking about - not a public search engine, just being able to locally search what your timeline has already downloaded previously. And I'm being told I'd be an asshole if I wanted to implement something like this… 😕 Or did I misunderstand it, are we talking about different things?
@mackuba @janl The big difference is the method, while it's integrated within misskey (or any other server softwares), it will follow the full activepub protocol with all actions, for example if someone blocks someone else, or deletes a post. The content is not pulled form other servers, their server is pushing the content to that server. If someone follows a user, the user will be notified about it and can act (block) and as it's part of the protocol, user can set their profile to "follow only if I allow" which means even if I try to follow the user, i'll not get their post unless their approved it.

So yes, does similar thing, but that's why I said I don't mind, if it's not a "pull everything unless they said no". Not the idea is"bad", the logic has flaws. When a user sees and even approve my follow request, they see my user, my profile, and my domain and they can say "nope" and refuse to push any content to this server. If someone deletes a post it will be deleted from this server too while a crawler can do that it's not trivial and pretty resource-heavy (checking all existing posts if they are still there). If a user block a whole instance, their content can be seen on other instances, but they intentionally blocked one instance or even a server has a full block. Yes that content is still accessible from other instances, but it's not "collected/exposed" on a central location.

I'm not sure I could describe it well enough
😪 Sorry I'm really tired after spending 4 hours in IKEA 😞
@efertone @janl Yeah, so I guess there are some nuances, but what I was thinking about wouldn't be very different - it's just that instead of the search database running on your local server and indexing what's pushed to it, it would be running on your computer and indexing the exact same data, pulled through the home timeline API of your server. It would also still only index things from people you can follow, because that's where the contents of your home timeline comes from.
@efertone @janl One difference would be the deleted posts since the deletion would probably not propagate to the local cache on the local computer… but I guess this could probably be fixed somehow.
@mackuba @janl I may or may not suggest to approach it from a different angle. Build a desktop client and add a cache layer and an option to search in your cache too. That way it's not something that "pulls and indexes" content, it's a desktop client that has cache and "timeline specific search" feature. I don't say that would not have issues, but it sounds much friendlier 😅

I wonder if Whalebird, Hyper-space, or any other clients have this feature already.
@efertone @janl Yes, that's precisely what I was thinking about :)

@janl I really hope a feature like this ends up in Mastodon itself or in the clients though.

Without excessive bookmarking of every post that seems even remotely relevant (which ultimately defeats the purpose) I never find anything on here ever again, as soon as it‘s more than a day or two in the past. :(

@fluffel @janl I have a vague plan/idea to build something for Mastodon for the Mac and would love to add such feature at some point there, so I'm watching this discussion carefully also for this reason…

Native apps kind of by definition keep some kind of cache/database of loaded posts in order to display them, so I don't think adding a search for that local db would make it a very different thing?

@mackuba @fluffel that would still violate the TOS of instances that forbid archiving data e.g. https://meta.chaos.social/terms — and there is no automated way of discovering that for all your followees.

@janl @fluffel What is "archiving" really? Is it about search, or about storing on disk and not in memory, or about not pruning the stored records? Or keeping a too large cache? Is it possible to define this?

Because if you disallow *any* saving to disk, then an iOS app restarted after being evicted from memory (basically after every app switch on my low-RAM iPhone 8) has to show a blank page before it reloads everything. That's bad UX, every app like this keeps some data between restarts.

@mackuba @fluffel that will have to be discussed. The limit will be somewhere between “cache the current state” and “store everything and go back forever”.

@janl @fluffel Also, I realize this is going to be a risky question, but does chaos.social have a right to say what I can do with the content I'm fetching for my timeline from *my* instance? I don't interact with chaos.social directly in any way, only with my instance's server. This is no longer "content on that instance"...

It would be kind of like sending emails that say in the footer that you don't have a right to keep the received emails in your GMail archive. It's my GMail archive 🤷🏻‍♂️

@mackuba I would encourage you to not find loopholes, but ways to make this work with the community that you are aiming to serve.
@janl Providing a feature that would be useful to a lot of people who are missing it now is also serving the community, just a different part of it. The problem is that the needs of one part of the community are in opposition to the needs of another part of the community… You can't satisfy both completely at the same time.
@mackuba you also can’t be an asshole about it.
@mackuba not saying you are, I just don’t recommend going there
@mackuba @janl first: I have no idea & usually hate thinking about "legal" stuff like that. But I think a more fitting metaphor would be: I send you an email with some copyrighted text or image in it. I don't think the location of where it's stored would be _the_ problem, but the level of publicity. If it's in _your_ Gmail archive it's probably fine, but when you put it on a public screen without telling the author it probably isn't anymore. But again, no lawyer and no idea what's "right" here.
@fluffel @janl That's more or less how I see it - you should have a right to keep a private archive of things you've seen, whether that's screenshotting an SMS, keeping an email in a GMail archive, or saving a website to HTML with "Save as". No one can really stop you, you always can do this technically. Pretending you can stop this is like disabling right-click on websites so that people can't save a picture to use as their desktop wallpaper.
@mackuba @fluffel “no one can stop you” is one hell of an assertion. You’ll be surprised.
@janl @fluffel can anyone stop me from silently keeping all JSON responses I get from my.instance.social /api/v1/timelines/home?
@mackuba @fluffel you are starting to advertise that you are planning to behave like an asshole. I do not recommend it.
@mackuba @janl probably not, but that's also a little like saying "can anyone stop me from photographing my neighboors every day?!!" no they can't, at least until they realize it and then you have a problem in both cases I guess

@fluffel @janl Fair enough, thanks for the warning - I just see these two things as nowhere near each other. More like the GMail archive mentioned before - I'm not coming to someone to record them in a private context, I'm talking about keeping something that was sent to me.

I don't think I'm going to change my mind, but I acknowledge that a lot of people see this differently and can react very badly to such feature. Which honestly makes me less interested in building tools for Mastodon at all…

@mackuba I recommend you read through all the comments my original post has received. I think building tools despite people discouraging you is not gonna work for anybody. You will have to find a way to work with them.
@mackuba @janl ok maybe my metaphor wasn‘t great, I give you that.
Maybe more like: Let‘s say you‘re at a public event and have a little, hidden recording-device that collects all the conversations someone is having with and around you.

@fluffel @janl That would be creepy, yeah. But I think there are very different expectations between recording real life vs. online and especially text…

Recording people in the street = creepy.
Recording all Facetime calls = creepy.
But: keeping an archive of iMessage, Messenger conversations = not creepy (these apps do this automatically!), as long as you don't share those private conversations publicly.
Saving a page with a thread on e.g. MacRumors forum that you've posted in = not creepy.

@mackuba @fluffel @janl I'm not sure that a notice on a website saying 'you can't archive us' is particularly enforceable.

The nature of the protocol is that messages are pushed, not pulled. A contract has 'offer, acceptance, consideration'. If they push messages to my activitypub inbox, there's no contract, and unless each message has its own clear terms, I doubt there's a contract. As opposed to scraping a website for content where terms are actually visible.

But hey, I'm not a lawyer.

n.b. To be clear, I'm not advocating violating clearly expressed wishes.

Just making the point that it seems weak to rely on from a legal perspective.

@mackuba @janl I don't think it's that easy.

Would a device that transcribes those conversations and only saves the text be ok, just because it isn't audio anymore?

Many people don't make the distinction between real life and online, because what's not real about us talking here right now?

Messenger archives are probably "ok" for most people, because those are explicitly started conversations that are either 1 on 1 or small-ish group chats so there's more trust involved.

It's complicated.

@fluffel @janl It is complicated… But I think my point is: when posting in a public forum on the Internet, people have a lower expectation of privacy than in a private conversation. They are aware (though not always thinking about it) that everything they post can potentially be seen by the whole world.

Us talking here is not like talking in a street in a crowd that ignores us, more like talking in the hall at a conference, where we can be overheard by anyone, maybe even randomly recorded.

@mackuba @janl But the fact that it's prohibited by default to be recorded or even photographed at for example CCC events is one of the things I really like about them.

Not everyone thinks about those places we all inhabit online in the same way.

In the end it's a social problem, that we can only solve together by negotiating what's acceptable as a community and that will take time, failures and a lot of nerves. But if we get this right I'm sure it can be glorious in the end! :)

@mackuba @janl "you should have a right to keep a private archive of things you've seen"

Easy Counter Example: Google Glass

@fluffel @janl yeah, mainstream AR is going to change *a lot* of things in how the world works, for better or worse, but we're not there yet…
@janl Thank you for demonstrating the level of interest in the feature, and helping to expose the ... interesting... expectations of perpetual control over their words by a small minority of vocal fediverse users.
@janl For what it's worth, I thought your project was a fantastic idea and I was looking forward to seeing it evolve. I wish the baseline Mastodon web interface could offer that.
@janl I think a relevant next thing to help clarify author expectations (not that you can or should do it necessarily) is a fully-client-side single-user tool for storing and locally searching all toots seen (maybe in browser local-storage?).
@JavierKolstad based on the feedback I got, even that might not go down well.
@janl See my follow-up thread https://mindly.social/@JavierKolstad/109688545390983787 regarding options for confirming the extent of the already provided (by accepting a follow-request) author opt-in. 🙂
Jay (@[email protected])

@[email protected] Such a client could be reasonably described as unusual, specifically in that it provides extra assistance to its user in remembering toots they have previously seen. With regards to confirming whether the existing opt-in of accepting a follow request (or posting a non-followers-only toot) is intended to include the use of such an "unusual client", I can see a range of possibilities: 🧵

Mindly.Social
@janl Such a client could be reasonably described as unusual, specifically in that it provides extra assistance to its user in remembering toots they have previously seen. With regards to confirming whether the existing opt-in of accepting a follow request (or posting a non-followers-only toot) is intended to include the use of such an "unusual client", I can see a range of possibilities: 🧵
@janl On one end, the tool could keep a list of accounts it has seen, and when a new account is seen for the first time, it could automatically send a DM, explaining that the sender is using an unusual client, and asking for explicit permission for this memory-assistance to be provided covering this author's toots. And it would only include them once such confirmation was received (including blanket confirmation via a profile hashtag). (Expanded on: https://mindly.social/@JavierKolstad/109688872280348872 )
Jay (@[email protected])

Thoughts about #Fediverse community understandings of #optIn #consent. My current sense of a generally accepted method for getting opt-in consent by authors for a new client feature would be: 🧵

Mindly.Social
@janl On the opposite end, I think it's justifiable to consider that the use of accessibility tools (ranging from text-to-speech, magnification, and including memory-assistance ones like this) isn't the legitimate business of authors, so no confirmation or notification is appropriate.
@janl Between these two poles, there are various middle options, too: Closer to the author-control end would be to send the same DMs, but not permit the author the option of continuing to have the receiver as a follower but forbid the receiver from using the accessibility tool. i.e. just inform the author: "I'm now using this tool; block me if you wish".

@janl Also in the middle, but closer to the not-the-authors-business end, would be to post a notice on the profile, but not send specific DMs to each author.

Those are the options I've thought of; there are likely others, but I thought this was useful to write up. Thanks again for your efforts around this!

@janl I'll do my best to withhold judgment until I can read that retrospective.
@janl I was really enjoying it and appreciated the thoughtful approach.