Mastodawn

like I said, brotli contains a large dictionary for web content / http which means you can’t compare it directly to other compressors when looking at web content

Show thread

phiresky Oct 21

compressed size is more important than speed of compression

Yes, but decompression speed is even more important, no? My internet connection gets 40MByte/s and my ssd 500+MB/s, so if my decompressor runs at <40MB/s it’s slowing down my updates / boot time and it would be better to use a worse compression.

Arch - since 2021 for kernel images archlinux.org/…/moving-to-zstandard-images-by-def… and since 2019 for packages lists.archlinux.org/pipermail/…/029739.html

brotli is mainly good because it basically has a huge dictionary that includes common http headers and html structures so those don’t need to be part of the compressed file. I would assume without testing that zstd would more clearly win against brotli if you’d train a similar dictionary for it or just include a random WARC file into –patch-from.

Cloudflare started supporting zstd and is using it as the default since 2024 blog.cloudflare.com/new-standards/ citing compression speed as the main reason (since it does this on the fly). It’s been in chrome since 2021 chromestatus.com/feature/6186023867908096

The RFC mentions dictionaries but they are not currently used:

Actually this is already considered in RFC-8878 [0]. The RFC reserves zstd frame dictionary ids in the ranges: <= 32767 and >= (1 << 31) for a public IANA dictionary registry, but there are no such dictionaries published for public use yet. [0]: datatracker.ietf.org/doc/html/rfc8878#iana_dict

And there is a proposed standard for how zstd dictionaries could be served from a domain datatracker.ietf.org/doc/rfc9842/

it’s better in every metric

Let me revise that statement to - it’s better in every metric (compression speed, compressed size, feature set, most importantly decompression speed) compared to all other compressors I’m aware of, apart from xz and bz2 and potentially other non-lz compressors in the best compression ratio aspect. And I’m not sure whether it beats lzo/lz4 in the very fast levels (negative numbers on zstd).

that struck me as weird about what you were saying

What struck me as weird about what you were kind of calling it AI hype crap, when they are developing this for their own use and publishing it (not to make money). I’m kind of assuming this based on how much work they put into open sourcing the zstd format and how deeply it is now used in much FOSS which does not care at all for facebook. The format they are introducing uses explicitly structured data formats to guide a compressor - a structure which can be generated from a struct or class definition, and yes potentially much easier by an LLM, but I don’t think that is hooey. So I assumed you had no idea what you were talking about.

Arch Linux - News: Moving to Zstandard images by default on mkinitcpio

Show thread

phiresky Oct 20

I have literally never heard of someone claiming zstd was the best overall general purpose compression. Where are you getting this?

You must be living in a different bubble than me then, because I see zstd used everywhere, from my Linux package manager, my Linux kernel boot image, to my browser getting served zstd content-encoding by default, to large dataset compression (100GB+)… everything basically. On the other hand it’s been a long time since I’ve seen bz2 anywhere, I guess because of it’s terrible decompression speed - it decompresses slower than an average internet connection, making it the bottleneck and a bad idea for anything sent (multiple times) over the internet.

I stand corrected on the compression ratio vs compression speed, I was probably thinking of decompression speed as you said, which zstd optimizes heavily for and which I do think is more important for most use cases. Also, try -22 --ultra as well as –long=31 (for data > 128MB).

Random sources showing zstd performance on different datasets

linuxreviews.org/Comparison_of_Compression_Algori…

redpill-linpro.com/…/compression-tool-test.html

insanity.industries/…/pareto-optimal-compression/

Comparison of Compression Algorithms

LinuxReviews

Show thread

phiresky Oct 20

My point is you are comparing the wrong thing, if you make zstd as slow as bz2 by increasing the level, you will get same or better compression ratio on most content. You’re just comparing who has defaults you like more. Zstd is on the Pareto front almost everywhere, you can tune it to be (almost) the fastest and you can tune it to be almost the highest compression ratio with a single number, all while having decompression speeds topping alternatives.

Show thread

phiresky Oct 19

Zstd by default uses a level that’s like 10x faster than the default of bz2. Also Bz2 is unusably slow in decompression if you have files >100MB.

Show thread

phiresky Oct 19

This is from the same people that made zstd, the current state of the art for generic compression by almost any metric. They know what they are doing. Of course this is not better at generic compression because that’s not what it’s for.

Show thread

phiresky Apr 15, 2025

high quality template:

phiresky Apr 15, 2025

"Homegrowns are next"

https://lemmy.world/post/28269719

"Homegrowns are next" - Lemmy.World

Lemmy

phiresky Apr 15, 2025

"Homegrowns are next"

https://lemmy.world/post/28269115

"Homegrowns are next" - Lemmy.World

Lemmy

Show thread

phiresky Jan 26, 2024

The ActivityPub protocol lemmy uses is (in my opinion) really bad wrt scalability. For example, if you press one upvote, your instance has to make 3000 HTTP requests (one to every instance that cares).

But on the other hand, I recently rewrote the federation queue. Looking at reddit, it has around 100 actions per second. The new queue should be able to handle that amount of requests, and PostgreSQL can handle it (the incoming side) as well.

The problem right now is more that people running instances don’t have infinite money, so even if you could in theory host hundreds of millions of users most instances are limited by having a budget of 10-100$ per month.