There is an interesting article titled "Please Don’t Share Our Links on Mastodon: Here’s Why!" about the startling load that Mastodon's mass-distributed link preview generation has on small independent webservers. But I cannot link it to you, because of a reason
@mcc And yet, it hit the frontpage of orangesite. :/
@drwho Orange cite doesn't have the same effect

@mcc Do they do local caching? Or do just so many people who see links there never bother to click on them?

(Only slight sarcasm - I avoid the place like I avoid tick bites.)

@drwho Orange Site is one website and Mastodon is many websites

@mcc Entirely fair. That makes sense.

https://news.ycombinator.com/item?id=40222067 (bluh...)

Maybe it's back up now?

Please don't share our links on Mastodon | Hacker News

@mcc Wait a minute... did ItsFoss just rediscover slashdotting? O.o
@drwho In this case what we are describing is a specific type of automated slashdotting caused specifically by Mastodon's distributed nature and link preview generation, which fetches pages with a different pattern from end-user traffic
@mcc @drwho slashdotting + blipverts = fedihugging
@mcc I did some digging - I did not know that was how it worked.
@mcc Is it worse than getting Slashdotted?
@mark The problem is it's automated, because the servers all contact to ask for the link preview at the same time

@mcc Oof, that's an interesting challenge.

Also feels like a hole in either Mastodon's use of Fediverse or Fediverse itself. If node A is cloning posts to node B, it's already generated a preview and should clone that too!

@mark @mcc

It's a terrible idea to trust that preview though.

@robryk @mcc In what sense? The preview my personal node generates can also be a lie because the server can inspect the source requester and change the output depending on who's asking.

@mark @mcc

In the sense that someone other than your client, your own instance (both of which you kind of need to trust anyway), and the actual site that's linked to (who's the source of the content, so the preview must trust it) can manipulate it.

The site showing different contents to different users is another issue that I agree exists and can cause similar problems _for malicious linked-to sites_. For nonmalicious ones consider e.g. a post expressing outrage at something bbc published with a link to the "article" on bbc with a helpful "preview".

@robryk It may be just personal preference, but it seems an odd place to draw the line of trust at "I trust this other node to tell me what posts its users made and the images they uploaded but not the link previews it generated and cached."

@mark

Huh, I'm very surprised that you find this line odd (I don't think I've seen this opinion in the past). I would appreciate if you answered a question or two so that I can understand it better (but do understand if you don't wish to).

The reason I find this line very natural is that I think in terms of which node is intended to be able to speak for which entities, especially that those entities are named in a way to remind us of that relation (domain in URLs, domain/instance part of a fedi ID). Do you think that it makes more sense to keep track of a more vague trust (as in, "that node is rather trustworthy") in general, that the mapping between nodes and entities is insufficiently natural, or something else I can't easily see?

@robryk Not in general, no. I think there's a very practical special-case reason to bend the simple model of trust in this case: too many nodes hammering a site can result in that site deciding that Mastodon is a threat to quality of service and doing their best to block every node.

That's bad for Mastodon as a Fediverse project (and, indirectly, good for the Twitters of the world... "Hey, we may have lax moderation, but we'll only tap your server once to build a preview link").

In terms of cleanest-model, I agree with your assessment of what should be authoritative. In terms of a cost-benefit tradeoff of most-damage-a-modified-link-preview-could-do vs. most-damage-distributing-the-build-of-the-preview-could-do however...

(I'm reminded of DNS, and the fact that while people don't like caching and what it does to the cleanliness of the domain-ip mapping, we put up with it because the alternative would be an untenable noise-mess of popular services' DNS authorities getting hammered. No caching would be cleaner, but there's a reason DNS entries are cached.)

@robryk @mark You could imagine manually configured chains of trust, or for example creating three independently administered preview servers and only accepting previews if they are identical between all three. It is a solvable problem

@mcc @robryk I think I'm also going to look into my server config and see if I can just kill the feature.

I've never actually needed or wanted link previews, in any social network. I have a browser and middle-click for that.

@mark @mcc you cannot (by default) trust the link preview provided by your peer, as they may alter it without your knowledge. yes, the destination site itself may alter output based on requester, but that's a different problem than the "malicious relay" one.

there are some solutions - a trust system where you take some servers' previews as gospel, or maybe the preview comes with a hash that HTTP HEAD can be used to verify (much cheaper than getting the whole page and preview), or pooling a cache for mastodon users e.g. what https://jort.link/ does

jort.link - a solution to fediverse request floods

A URL redirector and shield to solve fediverse request floods.

@greg @mcc If a peer starts effing with the datastream, I defederate them.

That's the tool for the job. "Mucking the previews" ought to be considered modulo-equivalent-malicious to "hosting Nazis" (assuming we had the feature).

I mean, I'm already trusting them not to muck other people's posts, right? To not slip ads in? To not do all manner of nasty things when they forward data to my node?

@mark @mcc @mark @mcc I guess that requires you to know that the malicious peer is doing it - and how do you know that, without visiting the original site to check...

EDIT: a peer can't alter someone else's post in transit due to HTTP signatures incl. message digest, so you have a reasonable expectation that the message you got is as the originating server wrote it - whether THAT server is playing games or not is, again, beside the point and solvable easily with blocklisting.

I guess link previews could be considered part of the original message and covered by the first Mastodon server to put a link up, which basically shifts the burden onto the mastodon operator instead of the website owner. This would require some extra changes to ActivityStreams or at least the fields most Fedi systems use in it. (iirc mastodon has only attachments, urls, bold and paragraph support)

@greg @mark I would simply introduce social and technical systems to prevent this
@mcc @mark hey now, let me armchair dev a bit cmon
@greg @mark @mcc if the peer is mucking with the preview couldn’t it just as easily muck with the link itself?
@mark @mcc the argument against forwarding a pre-fetched and rendered preview card is trust - can you trust every server in the fediverse to fetch and render a true and accurate representation of the preview card for every link? if you cannot, and you are not willing to accept the risk of forgery or misrepresentation, then you have to fetch and render at each node (so the argument goes - I am not a proponent of this argument).

@djsundog I just don't think that argument holds water, especially when the alternative is "every node independently queries the source machine."

I'm trusting peers to give an authoritative accounting of posts from users. Is trusting their preview computation a bridge too far?

I hope not, because consolidated social media doesn't have this problem from a technical standpoint, and that makes it a lot friendlier to web hosts than the Fediverse.

@mark I concur, and just made that argument in another post ha - https://toot-lab.reclaim.technology/@djsundog/112367639796872157 - I rambled a lot more than you did haha
DJ Sundog - from the toot-lab (@[email protected])

Content warning: sundog's hot take on fedihugging

reclaim.technology
@mcc The federated live feed downloads around 1MB/s with previews enabled.

@mcc Also, now that I've found the link, Source Who Shall Not Be Named is absolutely right, and this is a major problem Mastodon needs to fix. I'm a little sad the steering committee doesn't appear to be acting like it's a priority.

This strikes at the heart of the difference between consolidated and distributed social networks. Have they considered the impact it'll have on Mastodon as an ecosystem if web admins decide the "fediverse effect" is too much unnecessary load and start black-holing those requests?

"You can see that site through Facebook but not through Mastodon" is a bad look.

Please Don’t Share Our Links on Mastodon: Here’s Why!

We need to talk about this problem. Should Mastodon step up?

It's FOSS News
@mcc Said website makes my browser continuously download 2MB/minute while it's open without an adblocker.
@mcc i wish i could remember what it’s called but a few years ago someone on fedi built a redirecting preview cache you could link to instead? no idea if it’s still up though or who made it 😅
jort.link - a solution to fediverse request floods

A URL redirector and shield to solve fediverse request floods.

@mcc I sort of understand where they're coming from, but on the other hand I'm running my web server on 11 years old hardware, and I both linked to stuff on it myself, and have had other people link to it, and the only reason I noticed was because I was looking at the logs.
@mcc it kind of feels like link previews were a mistake and websites should simpler
@mcc
Perhaps they should ask someone who is familiar with the hosting of news sites. 🤷
@mcc Isn't this pretty much an ideal use of the Torrent protocol? Map all the link preview resources into a CAS and then pull the bits from random other federation sites instead of the source site. At most, upgrade the protocol with a new hash since SHA-1 has been collided.
@alexr that's interesting, but you'd still have to pick an authoritative SHA.
@mcc The initial hash list would have to come with the first referencing post. Unless somebody better at math could make a proof of how many sites would have to agree on hashes for something to be considered extremely likely to be authentic, without resorting to any sort of overly complex computation like blockchain.
@alexr hmm, come to think of it, this is something the posting server could do automatically. There might be value in this…
@mcc @alexr Perhaps posts with URLs could include a content hash of the opengraph/card/etc metadata for those URLs, which could be inexpensively verified by querying the website referenced (by returning the content hash rather than all of the link card content).

@mcc I keep flashing back to Harry Chesley's Rumor Monger app and the paper he based it on: Xerox PARC's “Epidemic Algorithms for Replicated Database Maintenance”

Some initial set of servers could fetch the resources and then based on “federation reputation values" share the hashes epidemically.

@mcc the confusing thing about that post is that they are using CloudFlare as a cdn but it apparently doesn't help? Either they or Mastodon are doing something really wrong for that to be the case.
@megmac @mcc Cloudflare as a CDN can't help you if your content isn't posting with caching headers that allow Cloudflare to actually cache requested pages, instead of just forwarding all the requests back to the source server.
@chris @mcc yes, hence either they or Mastodon are doing something wrong because opengraph previews should be served with caching headers.
@mcc i've heard about this and im... assuming its high priority to fix if its possible bc it seems like a REALLY bad problem for a Social Network
@emaytch there appears to be disagreement about how substantial the hit is
@mcc i saw a lot of stuff in the comments to that article you didn't share where ppl said the person had misconfigured their site, but... if mastodon is the ONLY place where that is a problem it is probably still very much worth disabling i feel like?
@emaytch @mcc Whenever there is a post about the Mastodon stampede problem, there is a secondary stampede caused by people posting this link to jwz's blog post about it: https://www.jwz.org/blog/2022/11/mastodon-stampede/
Mastodon stampede

"Federation" now apparently means "DDoS yourself." Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding. The server is basically unusable for 30 to 60 seconds until the stampede of Mastodons slows down. Presumably each of those IPs is an instance, none of which ...