Scaling: ActivityPub over NNTP?
Here’s an interesting thought experiment.
Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?
The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.
It’s basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.
Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don’t send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.
Let’s do a picture. A creates a post and sends it to B and D.
A ─ B ─ C
\ /
─ D ─
B sends it on to C.
Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.
Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That’s a bit tricky.
In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they’d POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.
{
"headers": {
"Signature": "...",
"Digest": "...",
"Date": "..."
},
"activity": { ... normal ActivityPub JSON ... }
}
In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.
NNTP comes with a lot of historical baggage so we’d probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.
Thoughts?
Vous vous souvenez de #MultideskOS ? Un peu ? Pas du tout ? Découvrez ou redécouvrez les quotes #USENET de Jayce par ici <3
Versión 6.1.1.10360 de Radarr, gestor de colecciones de películas para usuarios de Usenet y BitTorrent: https://www.dekazeta.net/foro/files/file/3298-radarr/

Radarr es un gestor de colecciones de películas para usuarios de Usenet y BitTorrent. Puede monitorizar múltiples fuentes RSS para nuevas películas y se interconectará con clientes e indexadores para cogerlas, ordenarlas y renombrarlas. También puede configurarse para actualizar automáticamente l...
@markmetz I still miss Usenet.
Plus side: we still have mailing lists. Not quite as freeform as Usenet, but more topically structured than the Fediverse.
There's probably room for Fediverse clients that can slice up posts topically so I could say "show me rec.arts.int-fiction" and have it show me appropriate things. But for Heaven's sake, not large language models.
I'd have to understand the math more, but would Bayesian classification be compatible with this? I've never read about it being used for an ever-changing set of queries, mostly just static "spam or not".
What would be missing from this of course would be the poster's intent in choosing a newsgroup.
I guess I still just miss Usenet.