Mastodawn

David Gerard Jul 4, 2023

this is an excellent summary of the real-life problems - moderation, discoverability, searchability - of a future federated Bluesky AT Protocol network from @jonny

https://neuromatch.social/@jonny/110552684614320107

jonny (good kind) (@[email protected])

so far, #BlueSky / #ATProtocol seems like a federated system the same way Google Alerts is a federated system. - you can self host your website or uses Google sites. - Google crawls you - People subscribe to algos/alerts - Google Alerts emails you the matches

Neuromatch Social

Show thread

mcc Jul 4, 2023

@davidgerard @jonny What I would like, eventually, is a system where mod/discover/search are potentially centralized, but I have my choice of which center to select without having to pick a fundamentally different set of content. (Which opens the door to a "center" which is, like, a tree of centers.)

I doubt BlueSky is that system because I subscribe to the conspiracy theory that BlueSky was specifically designed, before it split off from Twitter, to cement the Twitter servers at the center.

Show thread

Anil Dash Jul 5, 2023

@mcc @davidgerard @jonny discoverability and search are not necessarily centralized by any means. It just requires a different design to enable them in a consensual way across decentralized instances.

Show thread

Oblomov Jul 5, 2023

@anildash @mcc @davidgerard @jonny
Every time I read «discoverability and search requires centralization» I have to wonder if people failed to learn from Kademlia or are purposely ignoring that the concept of distributed search & discover has already been solved in the past, on even more distributed networks than the Fediverse.

Show thread

Erin Conroy Jul 5, 2023

@oblomov
I have no idea what Kademlia is or the story behind it. Can you explain or link me to a reasonably-sized digestible version, so I can educate myself?

@anildash @mcc @davidgerard @jonny

Show thread

jonny (good kind)

@chargrille

a very tl;Dr oversimplification is that every person or thing gets a big long binary sequence as an address. the bitwise XOR between two binary strings serves as a distance function, so distance between

011001
and
010011
is 2, etc.

to find where an ID is (so eg. finding how to connect to a peer, but it's a general location finding algorithm), you keep asking different peers where some ID that is closer (has a smaller XOR value) to the one you want is. peers keep some list of peers at various distances from themselves so they can say "I don't know exactly where that address is, but I do know one that is definitely closer than I am"

that's a very simple explanation, but you can also store additional information at locations in kademlia space to make more interesting things happen like search, etc. as alluded above

Show thread

Erin Conroy Jul 6, 2023

@jonny Thank you. Oh dear. I'm out of my depth. I will ask my other half to explain to me. But I guess I am wondering: don't you need a central authority to dispense the binary sequences/address to be sure that the addresses actually correspond to distance/location? If I'm just completely misunderstanding, that's fine, I'll ask him to try to bring me up to speed.

Show thread

Erin Conroy Jul 6, 2023

@jonny

If the addresses are just generated according to some agreed-upon convention, how do you prevent creation of duplicate addresses for different peers?
And what happens if you contact a peer ID, are given a "closer" peer ID from their list, you contact that peer next, but their list doesn't return any closer peer IDs (even though one does exist, it's just not kept on their list). Does your search end there?

Show thread

jonny (good kind)Jul 6, 2023

@chargrille
a search can certainly fail - the search is only guaranteed to terminate in ideal circumstances where all peers are online, reachable, and properly behaving. otherwise, the odds are still pretty high because of a lot of redundancy built in at different scales. if it fails, you can still get pretty close and maybe resume the search at another time when the peer might be back online. again lots of active research on how to make this robust. DHTs are typically just used for peer discovery/addressing as part of a larger system that might have different incentives to maintain uptime/discoverability.

Show thread

jonny (good kind)Jul 6, 2023

@chargrille
it depends on what you are addressing! if you are addressing some file, then you could use the hash of that file as the address, and then you would assume there is only some very small chance of an overlapping address from a different file (a hash collision)

otherwise, generating very large random numbers works reasonably well, and you can handle duplicates at a higher "layer" - see if this address is already taken, if not we're good. there are lots of other techniques for handling malicious behavior like purposefully impersonating an address or sending garbage routing data, some based on distributed trust (auto-banning peers that give bad data), others based on not trusting the DHT (it's just an address, but if the thing at the address match what's expected, we treat it like a fake).

the thing with a lot of distributed technologies is that making good enough promises works in a lot of cases, often sort of akin to the "trust but verify" idea. the addressing layer doesn't need to guarantee uniqueness because uniqueness can be handled at a different level, etc. rough consensus and quorum sensing, rather than strict guarantees of functionality, is sort of a hallmark of distributed tech.

they are not perfect! DHTs have a lot of problems, weak spots, misaligned incentives, etc. it's an active field of research :)