So @rl_dane introduced #bzip3 to me to use instead of #bzip2. Let's turn some bz2 files into bz3 to see the difference.

First example: 90k opus files

hey snips wake word dataset. It has ~90k opus files and a tar file of 3.1GB. bzip2 produces the same 3.1GB which is as expected. bzip3 created 3.0GB but used tons of computation power. Not worth the 100MB

Second example: Windows 7 virtual box VM image

Windows7.vdi it's Windows 7 VM image for the "special" days. I think I have to get rid of it. But while it is still there, let's see how each will perform. It is 16GB uncompressed. bzip2 -9 is 7.0GB. bzip3 is 6.3GB but at the expense of like 3x CPU time. Deleting all of them anyway. Down with Windows.

Third example: Pure XML text file

Pure XML file. It's Persian and English characters. Uncompressed is 1.7GB. bzip2 -9 is 276M while bzip3 is 260MB

Final example: Creating a simple bomb

So I did this:

dd if=/dev/zero of=./justzero bs=2G count=6

So now I have a 16GB with only zero bytes. bzip2 -9 is 672KB. bzip3 is 46KB.

Conclusion

Thank you @rl_dane

Real nice thing!

#compression #gzip #zip #filecompression #textcompression #datacompression #linux #unix #tech

@farooqkz

It really shines when you throw huge log files and other repetitive text files at it. :)

It does an amazing job with things like GPX and KML location data.

@farooqkz @rl_dane did you check zstd, it's pretty good too

@iux @farooqkz

zstd is amazing.

Bzip3 is more impressive with textual data.

But zstd is crazy fast. And still get better compression than gzip and sometimes bzip2