For text with a lot of repetition, #bzip3 still blows my mind. 😆

rld@Intrepid:Documents$ for x in cat "gzip -9" "zstd --ultra -22" "xz -9e" "bzip2 -9" bzip3; do $x < weatherlog-2024.txt |wc -c |tr "\n" "\t"; echo "$x"; done 1735300 cat 80423 gzip -9 63275 zstd --ultra -22 53516 xz -9e 52374 bzip2 -9 40645 bzip3 rld@Intrepid:Documents$ echo 1735300/40645 |bc -l 42.69405830975519744125

#Lossless #Compression #LosslessCompression

P.S. times:

real 1.49 zstd --ultra -22 real 0.94 xz -9e real 0.23 bzip2 -9 real 0.07 gzip -9 real 0.06 bzip3 real 0.00 cat

DANG. 😂

@rl_dane I didn't know bzip3 existed, nice
@rl_dane Woah, bzip3 is new to me. That is a pretty incredible difference.

@taylor

Yeah! Of course, this is still a block-sorting compression algorithm*, so you wont get much advantages over zstd or xz when dealing with datasets with more inherent entropy like binary files or whatnot, but it does miracles for text.

* Of course I know what that means. Tell you what, you tell me what you think it means, and I'll tell you if you're right. 🤣

Here's an example with non-text data, where you see that #bzip3 isn't as strong:

Pictures$ for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < Hobbes.jpg |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn 3445659 cat 3444164 xz -9e 3441839 zstd --ultra -22 3439158 gzip -9 3384450 bzip2 -9 3274433 bzip3

WAIT.
WHAT.

Let's try something else...

Videos$ f="Federated Timeline.webm"; for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < "$f" |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn 1231940 bzip2 -9 1231269 bzip3 1227060 xz -9e 1226931 cat 1226421 zstd --ultra -22 1226241 gzip -9

WHAT?!? THE WORLD IS BROKEN!!!

TrYiNg AgAiNnNn...

Documents$ f="Thinkpad x200 hardware maintenance manual.pdf"; for x in cat "gzip -9" "bzip2 -9" "bzip3" "zstd --ultra -22" "xz -9e"; do $x < "$f" |wc -c |tr "\n" "\t"; echo "$x"; done |sort -rn 8942833 cat 8657277 bzip2 -9 8617801 gzip -9 8592319 bzip3 8568484 xz -9e 8535244 zstd --ultra -22

Ok, that makes sense. That's what I was expecting.

YOU SAW NOTHING ELSE. DON'T ASK ME ANY MORE QUESTIONS. 🤣

P.S., here's another interesting one:

138240138 cat (large BMP file) 3768642 gzip -9 3143455 PNG format 1987020 zstd --ultra -22 1592854 bzip2 -9 1512291 bzip3 1501540 xz -9e
@rl_dane i was using recennly zstd ... But i should move to bzip3

@hyde

zstd is the all-around champ (especially for speed), but bzip3 kicks butt for textual data.