Mastodawn

Our zlib-rs project implements a memory-safe and performant drop-in replacement for zlib, a widely-used data compression library.

@folkertdev shares the status quo of zlib-rs, including the good news that performance for the highest compression level is on par with the zlib-ng fork of zlib.

Read the blog for all the details:

https://tweedegolf.nl/en/blog/134/current-zlib-rs-performance

@trifectatech

#rustlang #datacompression #opensource

Current zlib-rs performance - Blog - Tweede golf

Our zlib-rs project implements a drop-in replacement for libz.so, a dynamic library that is widely used to perform gzip (de)compression.

Show thread

Fabio Valentini Aug 26, 2024

@tweedegolf compiling with "target-cpu=native" is not an option for Linux Distributions (builder architecture != target architecture). would it be possible to make it use runtime CPU feature detection instead? or do you only apply this setting to affect optimizations done by LLVM, but no actual CPU-specific intrinsics are used in the code?

Show thread

Folkert de Vries Aug 26, 2024

@decathorpe @tweedegolf

We already use runtime CPU feature detection, e.g. here:

https://github.com/memorysafety/zlib-rs/blob/85bc778044f173bfdc934f2ab731eb3f94cdf70f/zlib-rs/src/adler32.rs#L9-L21

The advantage of `target-cpu=native` is that those branches are compiled away, because it is statically known what features are available and hence which path will be taken.

Runtime CPU feature detection has a performance cost, and we're still looking for the most performant way to do it, but it totally works.

zlib-rs/zlib-rs/src/adler32.rs at 85bc778044f173bfdc934f2ab731eb3f94cdf70f · memorysafety/zlib-rs

A safer zlib. Contribute to memorysafety/zlib-rs development by creating an account on GitHub.

GitHub

Show thread

Fabio Valentini Aug 26, 2024

@folkertdev @tweedegolf ah, perfect. thank you for the clarification!

Show thread

flamion

Aug 26, 2024

@folkertdev @decathorpe
Why not use something like https://github.com/ronnychevalier/cargo-multivers ? There it's just a check at startup that then decompresses the correct version for the local CPU and applies some binary patches

GitHub - ronnychevalier/cargo-multivers: Cargo subcommand to build multiple versions of the same binary, each with a different CPU features set, merged into a single portable optimized binary

Cargo subcommand to build multiple versions of the same binary, each with a different CPU features set, merged into a single portable optimized binary - ronnychevalier/cargo-multivers

GitHub

Show thread

Fabio Valentini Aug 26, 2024

@flamion @folkertdev that sounds like a nightmare (albeit an interesting idea). but no thank you, not something we can do in a distribution context either ;)

Show thread

flamion

Aug 27, 2024

@decathorpe @folkertdev I mean, in a distribution context you could just build the binary using cargo multivers though, as it does all of that for you, then produces a single output binary, right? At least for Rust programs.

Show thread

Fabio Valentini Aug 27, 2024

@flamion I don't think integrating another tool here would be worth it. there's already standardized support for loading libraries optimized for different microarchitecture levels built into the ELF loader (glibc HWCAPS), so if it's really essential for good performance, that's what we would use

Show thread

Bart Massey Aug 26, 2024

@tweedegolf @folkertdev @trifectatech

Folkert, thanks for a really informative and well-written post. it looks like `zlib-rs` is coming along fantastically well!

The point you make about "bounds checks are just a bunch of correctly predicted branches that don't cost much wall clock time" deserves a followup post, maybe? I've heard it before and believed it, but it's good to emphasize to those less familiar and afraid of the costs of checks: having hard evidence really carries the point home.

Show thread

Bart Massey Aug 26, 2024

@tweedegolf @folkertdev @trifectatech

Further, you might try turning on overflow checks and running the benchmarks again? Everybody is afraid of turning on overflow checks in release mode, but my suspicion is that the situation is really similar: I'd like to see these checks be on by default in release mode someday, although that ship may have already sailed.

Looking forward to watching the ongoing `zlib-rs` work. Thank you and your co-devs and sponsors for this!

Show thread

Folkert de Vries Aug 27, 2024

@po8 @tweedegolf @trifectatech

yes, I ran some further benchmarks.

First, we can disable bounds checks in the rust compiler (see below for how) and see whether that gives any speedup. It turns out that there is no significant effect, but the other performance counts are interesting.

Next, overflow checks do have a cost, about 3% in my measurements. That's too much for our library, but totally worth it for most production applications (that do less number crunching).

https://gist.github.com/folkertdev/23a92e853eb78e9a66829b71a276bd4b

effect of bounds and overflow checks in zlib-rs

effect of bounds and overflow checks in zlib-rs. GitHub Gist: instantly share code, notes, and snippets.

Gist

Show thread

Bart Massey Sep 9, 2024

@folkertdev @tweedegolf @trifectatech This is great: thanks much for the extra measurements! I know you're in a performance race and so 3% is a lot. I think this is good evidence, though, that overflow checks should be (should have been?) opt-out in the release profile. As you say most Rust code is now adequately performant, and will live or die by the safety benefits Rust brings.

Show thread

bondolo Aug 26, 2024

@tweedegolf @folkertdev @trifectatech I keep hoping for more uptake of zstd format for size and efficiency gains. Any thoughts towards supporting it?

Show thread

Erik Jonkers Aug 27, 2024

@bondolo @tweedegolf @folkertdev @trifectatech
Yes! We are planning for zstd in our data compression initiative: https://trifectatech.org/initiatives/data-compression/
Pending funding…

Data compression - Trifecta Tech Foundation

Show thread

Luca Bruno Aug 27, 2024

@tweedegolf @folkertdev
There is a typo you may want to fix in the intro ("Trifecata").

It's funny to note that the typo would change the Latin meaning to something like "three-times shitted", while later in the post you show a benchmarking tool called "poop". 💩😀

Show thread

Tweede golf Aug 27, 2024

@lucab @folkertdev Thanks for the heads-up! Guess the subconsciousness occasionally wants a poop joke... We fixed it anyway though 👍