Our zlib-rs project implements a memory-safe and performant drop-in replacement for zlib, a widely-used data compression library.

@folkertdev shares the status quo of zlib-rs, including the good news that performance for the highest compression level is on par with the zlib-ng fork of zlib.

Read the blog for all the details:

https://tweedegolf.nl/en/blog/134/current-zlib-rs-performance

@trifectatech

#rustlang #datacompression #opensource

Current zlib-rs performance - Blog - Tweede golf

Our zlib-rs project implements a drop-in replacement for libz.so, a dynamic library that is widely used to perform gzip (de)compression.

@tweedegolf compiling with "target-cpu=native" is not an option for Linux Distributions (builder architecture != target architecture). would it be possible to make it use runtime CPU feature detection instead? or do you only apply this setting to affect optimizations done by LLVM, but no actual CPU-specific intrinsics are used in the code?

@decathorpe @tweedegolf

We already use runtime CPU feature detection, e.g. here:

https://github.com/memorysafety/zlib-rs/blob/85bc778044f173bfdc934f2ab731eb3f94cdf70f/zlib-rs/src/adler32.rs#L9-L21

The advantage of `target-cpu=native` is that those branches are compiled away, because it is statically known what features are available and hence which path will be taken.

Runtime CPU feature detection has a performance cost, and we're still looking for the most performant way to do it, but it totally works.

zlib-rs/zlib-rs/src/adler32.rs at 85bc778044f173bfdc934f2ab731eb3f94cdf70f · memorysafety/zlib-rs

A safer zlib. Contribute to memorysafety/zlib-rs development by creating an account on GitHub.

GitHub
@folkertdev @tweedegolf ah, perfect. thank you for the clarification!
@folkertdev @decathorpe
Why not use something like https://github.com/ronnychevalier/cargo-multivers ? There it's just a check at startup that then decompresses the correct version for the local CPU and applies some binary patches
GitHub - ronnychevalier/cargo-multivers: Cargo subcommand to build multiple versions of the same binary, each with a different CPU features set, merged into a single portable optimized binary

Cargo subcommand to build multiple versions of the same binary, each with a different CPU features set, merged into a single portable optimized binary - ronnychevalier/cargo-multivers

GitHub
@flamion @folkertdev that sounds like a nightmare (albeit an interesting idea). but no thank you, not something we can do in a distribution context either ;)
@decathorpe @folkertdev I mean, in a distribution context you could just build the binary using cargo multivers though, as it does all of that for you, then produces a single output binary, right? At least for Rust programs.
@flamion I don't think integrating another tool here would be worth it. there's already standardized support for loading libraries optimized for different microarchitecture levels built into the ELF loader (glibc HWCAPS), so if it's really essential for good performance, that's what we would use

@tweedegolf @folkertdev @trifectatech

Folkert, thanks for a really informative and well-written post. it looks like `zlib-rs` is coming along fantastically well!

The point you make about "bounds checks are just a bunch of correctly predicted branches that don't cost much wall clock time" deserves a followup post, maybe? I've heard it before and believed it, but it's good to emphasize to those less familiar and afraid of the costs of checks: having hard evidence really carries the point home.

@tweedegolf @folkertdev @trifectatech

Further, you might try turning on overflow checks and running the benchmarks again? Everybody is afraid of turning on overflow checks in release mode, but my suspicion is that the situation is really similar: I'd like to see these checks be on by default in release mode someday, although that ship may have already sailed.

Looking forward to watching the ongoing `zlib-rs` work. Thank you and your co-devs and sponsors for this!

@po8 @tweedegolf @trifectatech

yes, I ran some further benchmarks.

First, we can disable bounds checks in the rust compiler (see below for how) and see whether that gives any speedup. It turns out that there is no significant effect, but the other performance counts are interesting.

Next, overflow checks do have a cost, about 3% in my measurements. That's too much for our library, but totally worth it for most production applications (that do less number crunching).

https://gist.github.com/folkertdev/23a92e853eb78e9a66829b71a276bd4b

effect of bounds and overflow checks in zlib-rs

effect of bounds and overflow checks in zlib-rs. GitHub Gist: instantly share code, notes, and snippets.

Gist
@folkertdev @tweedegolf @trifectatech This is great: thanks much for the extra measurements! I know you're in a performance race and so 3% is a lot. I think this is good evidence, though, that overflow checks should be (should have been?) opt-out in the release profile. As you say most Rust code is now adequately performant, and will live or die by the safety benefits Rust brings.
@tweedegolf @folkertdev @trifectatech I keep hoping for more uptake of zstd format for size and efficiency gains. Any thoughts towards supporting it?
@bondolo @tweedegolf @folkertdev @trifectatech
Yes! We are planning for zstd in our data compression initiative: https://trifectatech.org/initiatives/data-compression/
Pending funding…
Data compression - Trifecta Tech Foundation

@tweedegolf @folkertdev
There is a typo you may want to fix in the intro ("Trifecata").

It's funny to note that the typo would change the Latin meaning to something like "three-times shitted", while later in the post you show a benchmarking tool called "poop". 💩😀

@lucab @folkertdev Thanks for the heads-up! Guess the subconsciousness occasionally wants a poop joke... We fixed it anyway though 👍