I'm sure I made plenty of mistakes, but I have to admit I find it surprisingly satisfying to be able to operate on a data type that I can overlay on top of the existing #FedBOX storage engines and get native and *fast* querying for them.
The indexes are quite chunky despite being built on top of roaring bitmaps because there's so many "indexable" elements in an #ActivityPub object. (Currently I'm indexing the type, the content, summary, name, preferredUsername, the recipients, the actor and the object)
As I explore some more, I hope I streamline some of these issues, and make the whole thing more robust.
Frantic day today, around 10h of productive work on improving the Index and moving it as part of the go-ap/filters module.
+1510/-11 lines of which 987 belong to tests.
Coverage is not entirely sufficient yet, because it's missing the checks for the top level Index.Add() and Index.Search() methods.
Another thing left to do is the persistence to disk.
The **reason** why I wanted to move the work I've done yesterday to this module is that instead of the custom client.SearchByX() functions, I wanted to retrofit the existing functionality already present in the filters module. Ah, also moving the bitmaps themselves to a semblance of generic types....
The full(working) example can be found here: https://pkg.go.dev/github.com/go-ap/filters#example-SearchIndex
The experiment of using roaring bitmaps as the foundation for indexing #ActivityPub objects is half successful and half not.
The good news is that soon I'll be able to replace the #brutalinks client access to it's activitypub backend with something that's built on top of local storage that makes use of the indexes, therefore being much, much faster.
The bad news is that adding indexing to the storage backends themselves didn't result in too much performance gains, but I just suspect I'm doing something wrong.
It's a painful realization that I come to that no matter how much effort I put into making my #ActivityPub server be fast it's still going to suck if in order to build a meaningful page for a user the client needs to do many requests.
So the #brutalinks link aggregator now makes use of asynchronous collection fetching and then content rendering is being done from local storage.
This decreased the loading times to probably less than half of what they were before.
However I still need to find a good model for aggregating and balancing all the sequential loads with an eventual asynchronous sequential sending of activities.