this is an excellent summary of the real-life problems - moderation, discoverability, searchability - of a future federated Bluesky AT Protocol network from @jonny

https://neuromatch.social/@jonny/110552684614320107

see also

https://github.com/bluesky-social/proposals/issues/18
https://github.com/bluesky-social/proposals/issues/19

i particularly like the observation that the functions people *want* from social media - moderation, discoverability, search - just straight-up require centralisation.

Decentralisation has its virtues, such as the fediverse ticking along mostly fine while Twitter and Bluesky pooped themselves on Saturday. But for usability for non-nerds, decentralisation is a harsh antifeature - see Mastodon. You can't search your fuckin' friends, I mean wtf, FUNCTION NUMBER ONE on a new network!

Any eventual atproto network will naturally centralise on a big graph server, 'cos otherwise you don't get search or discoverability.

there isn't as yet a central repository of critiques. also the protocol isn't finished yet, there's a lotta vaporware and handwaving.

actual Trust & Safety people of considerable experience, e.g. Yoel Roth from Twitter and Denise from Dreamwidth and formerly of LiveJournal, spent many futile hours posting at length to the company CEO and devs on how Bluesky's plans would make essential moderation functions literally impossible.

even if Bluesky gave a hoot about doing moderation properly, which doesn't seem to occur to them. they seem literally incapable of understanding the question.

some of the devs are getting to understand the problems. because Bluesky pressed them into service to do moderation personally. they understood there was a problem here once they saw some shit.

but basically Bluesky wrote a moderation white paper and fell in love with it, and they are impervious to any idea they didn't think of themselves, or the history of thirty years of internet social media.

like, when you get to "let's make block lists public!" why the fuck are you doing something that obviously stupid? "well the white paper requires it" i mean.

there is no one weird trick to technically scale moderation. you have to do the fucking moderation. with people.

that's *fine* for now - there is no network. bsky.app is a fun single-node server to be on. 200k users, high quality queer shitposters, great userbase!

but it's important to keep in mind that it's *run* by rationalist neoreactionary-leaning blockchain bros who have shown an unfortunate tendency in practice to defend their neo-nazi friends from being kicked off for death threats against minorities.

the technical details are secondary, even if you approach them with an unwarranted assumption of good faith. because atproto was designed with bad assumptions by idiots. it's a historical fact that Jack Dorsey's driving motivation was to make a network nazis couldn't be permanently banned from. that's what he funded these people to do, and the tech is just details at that point.

on Mastodon, Bluesky would have been fediblocked by now just for its nazi coddling.

btw i will definitely be calling Bluesky's wizard white paper idea "compostable moderation" from now on

jonny (good kind) (@[email protected])

so far, #BlueSky / #ATProtocol seems like a federated system the same way Google Alerts is a federated system. - you can self host your website or uses Google sites. - Google crawls you - People subscribe to algos/alerts - Google Alerts emails you the matches

Neuromatch Social
@davidgerard @jonny i have a feeling that caring about decentralisation and being a good shitposter that I definitely want to follow are mutually exclusive traits

@davidgerard @jonny What I would like, eventually, is a system where mod/discover/search are potentially centralized, but I have my choice of which center to select without having to pick a fundamentally different set of content. (Which opens the door to a "center" which is, like, a tree of centers.)

I doubt BlueSky is that system because I subscribe to the conspiracy theory that BlueSky was specifically designed, before it split off from Twitter, to cement the Twitter servers at the center.

@davidgerard @jonny When I was trying to imagine moderation on my stalled-out decentralized-mastodon prototype, I imagined accounts clumping together into cliques with an agreed-on moderator, and then cliques clumping together into federations that accept each other's mod decisions. Someone wants to talk to you, you see if they've been cleared as Basically Okay by your clique or a member of your clique's super-clique (or maybe super-cliques ally together into super-super cliques, idk.)

@mcc @jonny the natural centralising tendency of the graph server and the DID server will keep it centred on bsky.app, yes

as for moderation, on mastodon we discover that moderators are also frickin' idiots. @amtiskaw receives some *astoundingly* dumb notices. I got a post reported literally because i advocated Signal.

@davidgerard It's not that I disagree, but I would like to float a theory by you: What if *all* moderators are frickin' idiots? saw plenty of frickin bad decisions made by Twitter moderators and you've posted some bad decisions made by Bluesky moderators.

If this is the case, the goal should be to find a system that allows us to eventually discover, and collectively empower, the *least* frickin idiotic moderators from the set of all possible candidates.

@mcc that would be nice, yes!
@davidgerard And then once we figure out how to do that we can solve the question of how to democratically elect a government. Easy breezy
@davidgerard Both the "pure centralized" model of a Twitter and the instance federation model of Mastodon aren't super good at this, and my hypothesis is it's because both are *brittle*. In the Twitter case if the mods turn out to be frickin bad you can do literally *nothing*. In the Mastodon case if they're frickin bad you can only fire them by switching instances, and that's so much work you might as well switch to bluesky…
@davidgerard Bluesky's "marketplace of algorithms" approach otoh, which I haven't fully investigated, has the potential to work like I want because it could turn into a marketplace of human moderation teams… but since the focus is so heavily on selecting different machine moderation approaches, I have doubts the ecosystem will actually grow in that direction :/

@mcc per the linked bugs, atproto doesn't seem to have had a lot of thought put into helping users defend against bad actors.

(because it exists to enable bad actors)

@mcc @davidgerard the thing that kinda sticks out for me is that “can run an internet-visible server well” and “can moderate a community well” are almost entirely dissimilar skill-sets, and yet we combine the two.

If you squint hard enough, Matrix of all places has something most similar to decoupling these? In the abstract¹, homeserver administrators are mostly responsible for ensuring the technical substrate runs well and to limit spam signups, and community moderation happens (largely) above that, with a moderation bot that can be subscribed to one or more feeds of moderation decisions.

¹: not to claim that this is how it does work, or that it works well now, but that there are the bones of an approach like this. Eventually.

@RAOF @davidgerard I like Matrix now. Element is a really nice program. Between Mastodon, Lemmy and Element I think Element is the best at being the thing it is.

@davidgerard @jonny In other words, the moderation model I want is Mastodon servers! I just want to make the "moderation authority" job Mastodon servers currently hold get delinked from the "post repository/user identity authority" job that Mastodon servers currently *also* hold.

If we could break these jobs apart, we could fix the end-user headaches that come from "your instance IS your username" while retaining the (admittedly ambiguous) benefits of the federated moderation model.

I think…

@mcc @davidgerard @jonny discoverability and search are not necessarily centralized by any means. It just requires a different design to enable them in a consensual way across decentralized instances.
@mcc @davidgerard @jonny (that conspiracy theory isn’t really true, but it’s reasonable to act as if it were anyway)
@anildash
@mcc @davidgerard
would be curious to hear more about what you know of the history/intention since I know very little about it. saying that as a genuine question bc I am skeptical about the intent and ofc involvement by jack Dorsey, but know very little in the way of specifics
@jonny @anildash @mcc jack (a) personally intervened to keep Richard Spencer on Twitter (this even made it to the WSJ) (b) wanted a social network where Trump couldn't be banned permanently (he intervened in Trump's case too)
@anildash @mcc @davidgerard @jonny
Every time I read «discoverability and search requires centralization» I have to wonder if people failed to learn from Kademlia or are purposely ignoring that the concept of distributed search & discover has already been solved in the past, on even more distributed networks than the Fediverse.
@oblomov @anildash @mcc @jonny sounds great, so why isn't it standard here

@davidgerard @anildash @mcc @jonny

Because Mastodon is designed to minimize network communications. Take it up with its devs 8-D

@oblomov @anildash @mcc @jonny taking things up with gargron tends to be a painful experience

@davidgerard @anildash @mcc @jonny

And that's why it isn't standard here ;-)

@davidgerard @oblomov @mcc @jonny I’ve found him to engage thoughtfully and had good luck talking through ideas.

@anildash @oblomov @mcc @jonny the existence of the treehouse fork tells a different story for other people

there is excellent reason there's a LOL Fuck Gargron Tendency

@davidgerard @oblomov @anildash I made a toy/prototype distributed mastodon project (and the point where I skidded out *wasn't* Kademlia, that part worked great). I would tend to think of Kademlia as good for *lookup* rather than *search*. Text search is a fuzzy operation rather than one-to-one. I would think of Kademlia as a dubious choice for a full-text index because publishing a hash in Kademlia requires you to join the network multiple times, at multiple locations, once per published object
@davidgerard @oblomov @anildash So say we're imagining a search layer on Mastodon rather than on a hypothetical pure-Kademlia social network. And leave aside that the thing holding back search on Mastodon is *a consensus it is unwanted*. Imagine searching for "Mastodon". You go to the point in search space corresponding to "Mastodon". You look for peers. And you find… every single server in the Fediverse. Because they *all* host posts about Mastodon. Now what do you do?

@davidgerard @oblomov @anildash Do you spam every single ActivityPub server in existence & ask for a list of its posts containing "Mastodon"? How do you sort the results?

The most successful Kademlia deployment I know of is the BitTorrent magnet link network. That *didn't*, to my knowledge, choose to use K for *text* search. Text search was farmed to… centralized indexing services, like TPB.

If anyone EVER successfully built a full-text search prototype atop K I'd love to see that paper/blog.

@davidgerard @oblomov @anildash (Note: I'm not making any attempt to disagree with Anil's original point https://mastodon.social/@[email protected]m/110660077658528760 just, the item in the toolbox I would reach for trying to make that real is most probably not Kademlia. Especially not in the ActivityPub/Fediverse instance context where the peer trust relationship is somewhat stronger than in Kademlia's original use case.)

@mcc @davidgerard @anildash

Allow me to shy away from full-text search, because aside from the technical issue there's also, as you point out, a sizeable social aspect that is a whole different discussion, and let's focus for a moment on user and hashtag discoverability, and let's agree on the idea that rather than a pure Kademlia solution we'd want to aim at something that can work on top of the existing network without bogging it down.

1/

@oblomov @mcc @davidgerard @anildash
I don't know enough to say much about the technical implications, but I think there'd definitely be support for some sort of distributed user directory so long as it were opt-in, and easy enough to opt back out of
@ajswritesthings @mcc @oblomov @anildash yep. there's various kludges to find your twitter network or whatever, but an opt-in phone book would definitely be a feature

@ajswritesthings @mcc @oblomov @anildash may i note also: the style of thread that goes:
"lol you fools i have the obvious technical answer that nobody's implemented yet, you rubes, you simpletons"
* others detailing why this is wrong *
"well obviously I didn't mean *that*,"

is fatuous reply guy nonsense and please don't do it here

@davidgerard

I'll take the blame for that, my phrasing was obviously extremely poor (to be generous): it was intended to remark just that centralization is not a hard requirement, but it obviously came out as suggesting we should be using Kademlia specifically, which wasn't the intention.

@ajswritesthings @mcc @anildash

@mcc @davidgerard @anildash

Currently, my understanding is that if I look for a username or hashtag, I only get matches for users and posts already known to my instance. This is what makes in-network discoverability poor “periphery-to-periphery”. But if my server could ask its federated neighbors for username and hashtag matches, it could already provide information from a much wider view of the network, for a reasonably limited “spamming” cost.

2/

@mcc @davidgerard @anildash

Of course you could even expand this idea to more hops, look for some balance between number of hops and rate limit to avoid DDoS.

And it could even be manual: on search, only provide local results, after which ask the user if they want to go over the network to find more.

I don't see this as worse than using tools like mastovue, and more user-friendly.

3/3

@mcc @davidgerard @oblomov @anildash Total tangent, but now I am reminded of the way BitTorrent made the decision to completely leave out discovery and search, and focus just on transfer, and then completely beat every other P2P file transfer system, all of which had search and discovery.

Then for a good while people kept trying to "improve" on BitTorrent by trying to add search and discovery to it.

@oblomov
I have no idea what Kademlia is or the story behind it. Can you explain or link me to a reasonably-sized digestible version, so I can educate myself?

@anildash @mcc @davidgerard @jonny

@chargrille

a very tl;Dr oversimplification is that every person or thing gets a big long binary sequence as an address. the bitwise XOR between two binary strings serves as a distance function, so distance between

011001
and
010011
is 2, etc.

to find where an ID is (so eg. finding how to connect to a peer, but it's a general location finding algorithm), you keep asking different peers where some ID that is closer (has a smaller XOR value) to the one you want is. peers keep some list of peers at various distances from themselves so they can say "I don't know exactly where that address is, but I do know one that is definitely closer than I am"

that's a very simple explanation, but you can also store additional information at locations in kademlia space to make more interesting things happen like search, etc. as alluded above

@jonny Thank you. Oh dear. I'm out of my depth. I will ask my other half to explain to me. But I guess I am wondering: don't you need a central authority to dispense the binary sequences/address to be sure that the addresses actually correspond to distance/location? If I'm just completely misunderstanding, that's fine, I'll ask him to try to bring me up to speed.

@jonny

If the addresses are just generated according to some agreed-upon convention, how do you prevent creation of duplicate addresses for different peers?
And what happens if you contact a peer ID, are given a "closer" peer ID from their list, you contact that peer next, but their list doesn't return any closer peer IDs (even though one does exist, it's just not kept on their list). Does your search end there?

@chargrille
a search can certainly fail - the search is only guaranteed to terminate in ideal circumstances where all peers are online, reachable, and properly behaving. otherwise, the odds are still pretty high because of a lot of redundancy built in at different scales. if it fails, you can still get pretty close and maybe resume the search at another time when the peer might be back online. again lots of active research on how to make this robust. DHTs are typically just used for peer discovery/addressing as part of a larger system that might have different incentives to maintain uptime/discoverability.

@chargrille
it depends on what you are addressing! if you are addressing some file, then you could use the hash of that file as the address, and then you would assume there is only some very small chance of an overlapping address from a different file (a hash collision)

otherwise, generating very large random numbers works reasonably well, and you can handle duplicates at a higher "layer" - see if this address is already taken, if not we're good. there are lots of other techniques for handling malicious behavior like purposefully impersonating an address or sending garbage routing data, some based on distributed trust (auto-banning peers that give bad data), others based on not trusting the DHT (it's just an address, but if the thing at the address match what's expected, we treat it like a fake).

the thing with a lot of distributed technologies is that making good enough promises works in a lot of cases, often sort of akin to the "trust but verify" idea. the addressing layer doesn't need to guarantee uniqueness because uniqueness can be handled at a different level, etc. rough consensus and quorum sensing, rather than strict guarantees of functionality, is sort of a hallmark of distributed tech.

they are not perfect! DHTs have a lot of problems, weak spots, misaligned incentives, etc. it's an active field of research :)

@anildash @mcc @davidgerard @jonny one thing I like about your follow-to-opt-in model is that it lets you pick multiple search engines to be part of.

@davidgerard @jonny I like this sentence a lot:

"the technical details are secondary, even if you approach them with an unwarranted assumption of good faith. because atproto was designed with bad assumptions by idiots."

@potatocubed
@davidgerard
ya to be clear my assumption of good faith comes from ignorance of the history of the project, not like giving the benefit of a doubt to Nazi sympathizers/enablers.
@jonny @potatocubed everyone assumed good faith and then it was a continuing cascade of the "this is totally my bag baby" scene from Austin Powers
Hi @davidgerard @jonny is there good/technical reason user search is lacking? This seems more like a political position taken by Mastodon than a technical hurdle?
@winstonsmith
@davidgerard
they're interrelated. my perspective/understanding of the history is limited, but from what I understand: search can/has been used to target abuse, so disabling search by default for safety. the technical part comes in where it's difficult to implement the additional steps that would be needed to make a safe, partial search where ppl who do not want to be indexed can reliably opt out.
@jonny @davidgerard thanks! I support safety concerns being taken seriously. Just unsure how realistic this opt-out of public indexation thing is when in a public context: that's what e-mail is for 🤷
@winstonsmith
of course not a purely technological problem, but yes this requires encryption and basically a different protocol ❤️

@davidgerard @jonny "there is no one weird trick to technically scale moderation. you have to do the fucking moderation. with people."

Yes 100. Verification, same.

Unpopular comment: Post.news is doing okay — despite loathsome investors & virtually no great posters — is doing okay bc it's moderated, searchable, and is paying people to do that work w ordinary revenue (no idea how much float) that doesn't come from ads/surveillance but from users paying for content. Go figure.

@davidgerard @jonny The lack of search on Mastodon is not a technical limitation, it is a deliberate decision. It is relatively trivial to build a network-wide account search, but culturally not wanted.
@tomw @jonny yes, that's tautological. this doesn't make it not a serious flaw in practice, or trying to make a virtue of a defect