I have a story to tell that is relevant to the xz-utils thing that just happened. I'll probably write this up properly later, but I'm in pre-vacation mode so it may take a while . We have a problem with the way we develop and then distribute FOSS software, and both stories show that. A while ago I looked at the testcases of a widely used library implementing a widely used data format. There was one file that was... strange. 馃У
That file was named similar to the other testcases, but it was not used in any test. And if you fed that file into anything using that library, it would either crash or cause enormous CPU spikes. And most interestingly: This file was nowhere to be found in the project's git repository. It was *only* in the tarball.
I contacted the responsible project, but I never got an answer and never really got to the bottom of this. But here's what I think happened: This was a proof of concept file for a yet unfixed and undisclosed vulnerability. It appears the developer already had a testcase for that bug in his local copy of the source tree. And then created the tarball from that source tree. And by doing that leaked a PoC for a zeroday. FWIW, it was "only" a DoS bug. But still.
I wanted to disclose this eventually, but then a new version of that library came out and fixed the bug. And plenty of others, and well, people crash parsers for data formats from hell all the time. And I had some concerns that it would sound like I wanted to ridicule the dev, which wasn't my intention at all. But I already thought there's a deeper story here than someone accidentally leaking a PoC for an unfixed vuln. Why can this even happen?
Pretty much everyone develops code using Git these days, or some other SCM (some don't, there's this mail server, but I disgress). But people distribute code in tarballs. How does a Git repo become a tarball? The answer may disturb you. It's basically "every dev has some process, maybe some script, maybe some commands they remember". Nothing is reproducible, nothing is verifiable.
This creates a situation where even when the "many eyes" principle works, i.e. people are actually looking at the code, and at code changes and commits, you still have a path to a compromised package. Because noone checks how this git repo turns into a tarball. Because noone can, as nothing is standardized or reproducible. I can tell noone does for one of the most important libraries to parse one of the most important data formats, because of the story I just told you.
There were some substantial efforts to create "reproducible builds" in some areas. This is closely related, but not exactly the same thing. Even if we have reproducible builds, we don't have "reproducible source distribution". We should have that. Git already has some cryptographic integrity, and as much as it has some flaws (sha1...), it's a lot better than nothing at all. But we don't connect any of that to the actual source tarballs.
I think the same issue is true for most package managers out there. I don't think there's any mechanism that ties e.g. what's on pypi to what is in any git repo. (Anyone knows if any package manager does that?)
Anyway, what we should have is that every release of a software is tied to a git commit hash. And there should be a verifiable, automated process that checks it. It's more complicated than it sounds, as particularly in "C land" we have autotools, and what's in the source tarball is not just a snapshot of what's in the source repo, but contains all kinds of generated stuff. EIther those need to be reproducible, or we need to just stop doing that. It's solvable, but there are some obstacles. /fin
@hanno That reminds me of how NixOS generates flakes.
I'm not a huge fan of Nix but that notion does seem to fix this issue at hand as far as I understand it.
@publicvoit @hanno yes, Nix/nixpkgs achieves this to a certain extend. Any package in nixpkgs can be tied to its source, be it a provided tarball (hash will be checked) or a reproducible build from source.
E.g. in the case of xz, the tarball from GitHub was being fetched (
https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/tools/compression/xz/default.nix)

nixpkgs/pkgs/tools/compression/xz/default.nix at nixos-unstable 路 NixOS/nixpkgs
Nix Packages collection & NixOS. Contribute to NixOS/nixpkgs development by creating an account on GitHub.
GitHub@basbebe @hanno Bastian, do you know what it means that the Github repo was removed by Github (until Nix project finds an alternative/solution)?
Does that mean that all updates/installs are breaking at the moment?