@mcc Do they do local caching? Or do just so many people who see links there never bother to click on them?
(Only slight sarcasm - I avoid the place like I avoid tick bites.)
@mcc Entirely fair. That makes sense.
https://news.ycombinator.com/item?id=40222067 (bluh...)
Maybe it's back up now?
@mcc Oof, that's an interesting challenge.
Also feels like a hole in either Mastodon's use of Fediverse or Fediverse itself. If node A is cloning posts to node B, it's already generated a preview and should clone that too!
In the sense that someone other than your client, your own instance (both of which you kind of need to trust anyway), and the actual site that's linked to (who's the source of the content, so the preview must trust it) can manipulate it.
The site showing different contents to different users is another issue that I agree exists and can cause similar problems _for malicious linked-to sites_. For nonmalicious ones consider e.g. a post expressing outrage at something bbc published with a link to the "article" on bbc with a helpful "preview".
Huh, I'm very surprised that you find this line odd (I don't think I've seen this opinion in the past). I would appreciate if you answered a question or two so that I can understand it better (but do understand if you don't wish to).
The reason I find this line very natural is that I think in terms of which node is intended to be able to speak for which entities, especially that those entities are named in a way to remind us of that relation (domain in URLs, domain/instance part of a fedi ID). Do you think that it makes more sense to keep track of a more vague trust (as in, "that node is rather trustworthy") in general, that the mapping between nodes and entities is insufficiently natural, or something else I can't easily see?
@robryk Not in general, no. I think there's a very practical special-case reason to bend the simple model of trust in this case: too many nodes hammering a site can result in that site deciding that Mastodon is a threat to quality of service and doing their best to block every node.
That's bad for Mastodon as a Fediverse project (and, indirectly, good for the Twitters of the world... "Hey, we may have lax moderation, but we'll only tap your server once to build a preview link").
In terms of cleanest-model, I agree with your assessment of what should be authoritative. In terms of a cost-benefit tradeoff of most-damage-a-modified-link-preview-could-do vs. most-damage-distributing-the-build-of-the-preview-could-do however...
(I'm reminded of DNS, and the fact that while people don't like caching and what it does to the cleanliness of the domain-ip mapping, we put up with it because the alternative would be an untenable noise-mess of popular services' DNS authorities getting hammered. No caching would be cleaner, but there's a reason DNS entries are cached.)
@mark @mcc you cannot (by default) trust the link preview provided by your peer, as they may alter it without your knowledge. yes, the destination site itself may alter output based on requester, but that's a different problem than the "malicious relay" one.
there are some solutions - a trust system where you take some servers' previews as gospel, or maybe the preview comes with a hash that HTTP HEAD can be used to verify (much cheaper than getting the whole page and preview), or pooling a cache for mastodon users e.g. what https://jort.link/ does
@greg @mcc If a peer starts effing with the datastream, I defederate them.
That's the tool for the job. "Mucking the previews" ought to be considered modulo-equivalent-malicious to "hosting Nazis" (assuming we had the feature).
I mean, I'm already trusting them not to muck other people's posts, right? To not slip ads in? To not do all manner of nasty things when they forward data to my node?
@mark @mcc @mark @mcc I guess that requires you to know that the malicious peer is doing it - and how do you know that, without visiting the original site to check...
EDIT: a peer can't alter someone else's post in transit due to HTTP signatures incl. message digest, so you have a reasonable expectation that the message you got is as the originating server wrote it - whether THAT server is playing games or not is, again, beside the point and solvable easily with blocklisting.
I guess link previews could be considered part of the original message and covered by the first Mastodon server to put a link up, which basically shifts the burden onto the mastodon operator instead of the website owner. This would require some extra changes to ActivityStreams or at least the fields most Fedi systems use in it. (iirc mastodon has only attachments, urls, bold and paragraph support)
@djsundog I just don't think that argument holds water, especially when the alternative is "every node independently queries the source machine."
I'm trusting peers to give an authoritative accounting of posts from users. Is trusting their preview computation a bridge too far?
I hope not, because consolidated social media doesn't have this problem from a technical standpoint, and that makes it a lot friendlier to web hosts than the Fediverse.
Content warning: sundog's hot take on fedihugging
@mcc Also, now that I've found the link, Source Who Shall Not Be Named is absolutely right, and this is a major problem Mastodon needs to fix. I'm a little sad the steering committee doesn't appear to be acting like it's a priority.
This strikes at the heart of the difference between consolidated and distributed social networks. Have they considered the impact it'll have on Mastodon as an ecosystem if web admins decide the "fediverse effect" is too much unnecessary load and start black-holing those requests?
"You can see that site through Facebook but not through Mastodon" is a bad look.
@mcc I keep flashing back to Harry Chesley's Rumor Monger app and the paper he based it on: Xerox PARC's “Epidemic Algorithms for Replicated Database Maintenance”
Some initial set of servers could fetch the resources and then based on “federation reputation values" share the hashes epidemically.
"Federation" now apparently means "DDoS yourself." Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding. The server is basically unusable for 30 to 60 seconds until the stampede of Mastodons slows down. Presumably each of those IPs is an instance, none of which ...