It looks like the media caching is problematic, it consumes a huge space 😅 just running the instance for two days produced 1.4gb of data 😭
@arda how much space have you used till now? since you have been running your instance longer than me

@zaherg You see now why I have been trying to find a decent #storage and #cdn solution now? 😆

Usage shows 16.25 GB on my bucket solution.

You can find the details attached. Headers take a lot of space, but this #pr on #github will bring a #tootctl command to remove them on the next release:

https://github.com/mastodon/mastodon/pull/22149

The story: https://github.com/mastodon/mastodon/issues/9567

I didn't want to kill my #ssd on the long run, and settled on #idrive #idrivee2 for #s3 storage, and #cloudflare for cdn solution.

#mastoadmin

Add command to remove avatar and header images of inactive remote accounts from the local database by evanphilip · Pull Request #22149 · mastodon/mastodon

This implements a new sub-commands for tootctl media called remove-profile-media to remove avatar and header images of remote accounts that appear inactive from the local database. Fixes #9567 : ab...

GitHub
@zaherg It's been a week since I migrated to idrive, so you can start counting from 11GBs. Only the "(size local)" are your own assets.
@arda I started with S3 since day one, and using cloudflare full cache to help so no need to other service in between 😅 …
Caching is toooooooo much than I have ever expected, can’t imagine how much twitter were using for all the media 🤷‍♂️

@zaherg s3 is too expensive man, idrive is like 1/5th of the price per gb, and ingress and egress is free (unlimited ingress and 3x size of bucket is free). backblaze is banned in Turkey, so auth and bucket api does not work for me without vpn. Also wasabi pay-as-you-go price per gb is too expensive.

(#backblaze b2 is $0.005 / gb, free ingress, and $0.01 / gb for egress)

🧵

@arda I use Cloudflare R2 which is S3 compatible https://www.cloudflare.com/products/r2/

I might switch to idrive if the cost from Cloudflare was more than I can afford :D

Cloudflare R2 | Zero Egress Distributed Object Storage | Cloudflare

Cloudflare R2 is an S3-compatible, zero egress-fee, globally distributed object storage. Move data freely and build the multi-cloud architecture you desire.

Cloudflare

@zaherg
There are also some alternatives:
https://free-for.dev/#/?id=iaas

I've used #scaleway before, and their current plan for object storage is is confusing. #tebi has the same price as aws. #Storj still feels too janky. A friend of mine had a node for them for crypto. #Synology #synologyc2 bandwith is also quite expensive.

Cloudflare also has r2 object storage, but still, expensive if you'd ask me: https://www.cloudflare.com/en-gb/products/r2/

So I rely on idrive and #cloudflare for the current solution 🧵

Free for developers

@zaherg Additionally, I rely on https://hub.docker.com/r/nginxinc/nginx-s3-gateway for the time being to make a public proxy, but I'm too lazy apply the ACL rule @vito shared on LowEndTalk back then. Also, idrive mentioned they'd enable it by this week from their UI, so if not I'll try to make it this weekend.
Docker

@arda I relay on cloudflare itself to cache everything 🫣 .. you see I am lazier than you ..

@zaherg I modify headers on that docker container to make cloudflare understand it's cachable😆

Is that a page rule? I believe it'd be an overkill since you only have 3 page rules per free account.

I wrote a cache rule (it's 10 free, instead of 3, per free account) , for #cloudflare but I did not enable it. Maybe you can confirm it looks good from your end 😊

@arda I only need one for the assets that are uploaded to R2.

True they are 3 and cache rule is 10.

but lets be honest, we need only 2:
1. for the assets
2. for the main domain 🤷‍♂️

@arda at least in my case two is more than enough as I will only cache the public services not the private one.

@zaherg Are there any differences in practice in cache rule and page rules, if you want to cache?

Also, is it okay to cache for the api? I deliberately did not cache my main instance domain.

@arda with cache rules you can have more controls.

For the api it depends on your goal, but in general since the api does not change much I say it’s okay 🤷‍♂️.

With a good caching time you can get magical response 😅

@zaherg What I was afraid was #api responses being #cache d as well. Hmm. I'll just enable what I mentioned then for the assets. Most people browsing #mastodon use their own #webUI anyways Let's See. #mastoadmin
@arda let me know how things will work…
As for now I am caching everything (I guess 🤷‍♂️)

@zaherg I enabled cloudflare cache, and I'm removing cache-control and expires headers from my proxy. Let's see how it'll work.

Dummy new uploaded image to test headers is attached.

@zaherg Well, I removed my "expires max" and cache-control 1year headers, and my #nginx #s3 #gateway container does not give any cache headers, and when you fetch publicly, this currently respects cloudflare's cache rules so far now, and it's also hits to their cache (and not to my server). Nice, so far so good.

You can also try yourself, by checking headers. cf-cache-status header will inform you whether it hit the #cloudflare cache or not.

@arda I did, most of the images are "miss" 🤷‍♂️ no idea why I have to check.

unless I get the exact image url and request it I won't get "Hit"

@zaherg It's normal for the first time to be miss. It was a miss for the first time for me as well. What matters is the next times 😊

If it's fetched the first time it's a "miss", because cloudflare does not know about it yet, and it's served through the origin. When you re-fetch again, because cloudflare knows it (and configuration is correct), then it should be a "hit".

Caching across all regions may take a little time for them though, but should not take more than a minute for same region.

@arda nope, I am talking about old images the one you sent a few hours ago, they are all cached from on my server yet, I am getting miss a lot 🤷‍♂️
@arda I am now going full armed and I have enabled all cache options .. lets see which one will break 😭

@zaherg Well, this is sourced from your own cdn url, and cache works nicely from what I can tell🎉

However, I don't see an expires header, but I wonder if it's necessary (Edit: it's not, #stackoverflow confirms it).

I only have 1 cache rule from #cloudflare, like I shared on my earlier toot, and nothing else. I don't have a page rule.

@arda exactly what I am saying, if you request the URL directly you will get HIT but while browsing mastodon it won't be a hit check https://d.pr/i/CibKX3
Screen Capture on 2022-12-21 at 7-20-33 PM.gif

Shared with Droplr

@zaherg while browsing and listing, you're not getting the original size, but a resized version, but I'll re-browse this feed and check again.

@zaherg Aha I believe got it. Because the "miss" status is also cached to browser 😆 Check the network tab with normal navigation, without hard refresh, there are two sources for the asset, one is "network", the other is "loaded from cache" , and loaded "from the cache" has a MISS status for cf-cache-status. It also forces to cache into browser (cache-control honors the browser caching).

I believe the best way is to check headers from curl etc, so no browser caching involved.