I have a story to tell that is relevant to the xz-utils thing that just happened. I'll probably write this up properly later, but I'm in pre-vacation mode so it may take a while . We have a problem with the way we develop and then distribute FOSS software, and both stories show that. A while ago I looked at the testcases of a widely used library implementing a widely used data format. There was one file that was... strange. 🧵
That file was named similar to the other testcases, but it was not used in any test. And if you fed that file into anything using that library, it would either crash or cause enormous CPU spikes. And most interestingly: This file was nowhere to be found in the project's git repository. It was *only* in the tarball.
I contacted the responsible project, but I never got an answer and never really got to the bottom of this. But here's what I think happened: This was a proof of concept file for a yet unfixed and undisclosed vulnerability. It appears the developer already had a testcase for that bug in his local copy of the source tree. And then created the tarball from that source tree. And by doing that leaked a PoC for a zeroday. FWIW, it was "only" a DoS bug. But still.
I wanted to disclose this eventually, but then a new version of that library came out and fixed the bug. And plenty of others, and well, people crash parsers for data formats from hell all the time. And I had some concerns that it would sound like I wanted to ridicule the dev, which wasn't my intention at all. But I already thought there's a deeper story here than someone accidentally leaking a PoC for an unfixed vuln. Why can this even happen?
Pretty much everyone develops code using Git these days, or some other SCM (some don't, there's this mail server, but I disgress). But people distribute code in tarballs. How does a Git repo become a tarball? The answer may disturb you. It's basically "every dev has some process, maybe some script, maybe some commands they remember". Nothing is reproducible, nothing is verifiable.
This creates a situation where even when the "many eyes" principle works, i.e. people are actually looking at the code, and at code changes and commits, you still have a path to a compromised package. Because noone checks how this git repo turns into a tarball. Because noone can, as nothing is standardized or reproducible. I can tell noone does for one of the most important libraries to parse one of the most important data formats, because of the story I just told you.
There were some substantial efforts to create "reproducible builds" in some areas. This is closely related, but not exactly the same thing. Even if we have reproducible builds, we don't have "reproducible source distribution". We should have that. Git already has some cryptographic integrity, and as much as it has some flaws (sha1...), it's a lot better than nothing at all. But we don't connect any of that to the actual source tarballs.
I think the same issue is true for most package managers out there. I don't think there's any mechanism that ties e.g. what's on pypi to what is in any git repo. (Anyone knows if any package manager does that?)
Anyway, what we should have is that every release of a software is tied to a git commit hash. And there should be a verifiable, automated process that checks it. It's more complicated than it sounds, as particularly in "C land" we have autotools, and what's in the source tarball is not just a snapshot of what's in the source repo, but contains all kinds of generated stuff. EIther those need to be reproducible, or we need to just stop doing that. It's solvable, but there are some obstacles. /fin
@hanno How could autotools theoretically solve this, when the whole point (for whatever reason- I don't understand the rationale) is that e.g. configure files and friends don't exist until releases are made?
@cr1901 I mean if the process is reproducible, it can be checked. But then you need some machine readable documentation of that process. And why they even do it this way: I think the rationale was that you can run configure scripts without having autotools installed.

@hanno Right, that rationale actually make sense from the Unix-centric lens of "why should we bother making a config language when your OS provides one* "sufficient" for the task"?

* Except when it doesn't provide one, like anything not-Unix :P.

I've seen configure scripts from Softlanding Linux System. ./configure scripts weren't horrible in 1992. But feels like they got too unwieldy too quick :(.

@cr1901 @hanno with autotools, configure files are required to generate makefiles, which means that they all must exist just to do a build. their existence has no particular correlation to a release, even if configure.ac is used to define a release version.
@cr1901 @hanno autoconf needs to die
@lambdafu @cr1901 and C with it, but well... legacy code is a reality.

@hanno That reminds me of how NixOS generates flakes.

I'm not a huge fan of Nix but that notion does seem to fix this issue at hand as far as I understand it.

@publicvoit @hanno yes, Nix/nixpkgs achieves this to a certain extend. Any package in nixpkgs can be tied to its source, be it a provided tarball (hash will be checked) or a reproducible build from source.
E.g. in the case of xz, the tarball from GitHub was being fetched (https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/tools/compression/xz/default.nix)
nixpkgs/pkgs/tools/compression/xz/default.nix at nixos-unstable · NixOS/nixpkgs

Nix Packages collection & NixOS. Contribute to NixOS/nixpkgs development by creating an account on GitHub.

GitHub

@basbebe @hanno Bastian, do you know what it means that the Github repo was removed by Github (until Nix project finds an alternative/solution)?

Does that mean that all updates/installs are breaking at the moment?

@publicvoit @hanno currently the package (and any older, non-customized versions) will be pulled from cache (cache.nixos.org), so that shouldn’t be an immediate problem.

There is a mirror by the original maintainer that could be used in the future:

- https://github.com/NixOS/nixpkgs/pull/300028
- https://discourse.nixos.org/t/cve-2024-3094-malicious-code-in-xz-5-6-0-and-5-6-1-tarballs/42405/18

But yes, this seems to be a currently unsolved issue

Revert "xz: 5.4.6 -> 5.6.1" by mweinelt · Pull Request #300028 · NixOS/nixpkgs

Description of changes The upstream tarball has been tampered with and includes a backdoor for which we cannot completely rule out, whether we are affected. https://www.openwall.com/lists/oss-secur...

GitHub
Sorry for advertising #guix here ;-) Software Heritage archives all the sources used by guix. And this is done for exactly the case that the original source disappears.
Thus users of guix will not face such an issue - as long as Software Heritage is alive.
@publicvoit @basbebe @hanno
No, nix does not solve the issue, neither does guix. Packagers can always decide to use the *dist* tarball as source - instead of some git checkout.
@basbebe @publicvoit @hanno
@hanno the whole story also reminds me of the Webmin backdoor we investigated together in 2019.
Same deal: it was only in the release tarball.
We learned nothing from it.

@hanno there is https://slsa.dev/ to solve mostly exactly this, and we've integrated that in PrivateBin: https://github.com/PrivateBin/PrivateBin/blob/master/doc/Release.md

This only works, because the build process is super simple, "git archive" command essentially. And this is, what you also get with GitHub's source links, which is great as it can only remove files, not add ones and you have a valid source tarball, also referenced by a third-party (like SLSA aims at). #supplyChainSecurity /cc @elrido

Supply-chain Levels for Software Artifacts

SLSA is a security framework. It is a check-list of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure in your projects, businesses or enterprises. It’s how you get from safe enough to being as resilient as possible, at any link in the chain.

SLSA
rugk (@rugk@chaos.social)

Note on all the #xz drama, there are some technical solutions for such #supplychainattack that can make such an attack way harder, at least to hide the code in tarballs etc. https://slsa.dev/ e.g. is a solution. Combined with reproducible builds, it ensures that a software artifact is built exactly from the source given in a source repository, with the possibility to prove that and no way for any maintainer to tamper with (in the highest level). #slsa #infosec #security #linux #backdoor

chaos.social
@hanno Apart from doing a "git checkout tag" and having a signed manifest of every single controlled file (woe betide you if you have submodules), the other conundrum is that reproducible compressed containers are also needed. That's not easy to do outside tarballs, so Windows folks are usually excluded out of the box. (And even with tarballs, the incantations are really arcane.)
@hanno Meson does their deployments with a git checkout + archive, but of course does not take care of the signing step.
@hanno I think you are right for many software stacks like C autotools, but if you check more modern languages for example Go tools it’s not as bad. These are much better in shape than the mentioned ones.
Also all the sbom tools should help to make it better.

@hanno when it comes to autotools, the correct answer is always "stop doing that".

a build system that regularly fails at both backwards & forward compatibility? every developer must have specific version(s) of the build system installed (effectively system-wide) on their machines (and sometimes, different versions for different projects).

It's not super-popular, but one of waf's main selling points is that it is a part of your codebase, managed with the same tools as the source itself.

Behdad Esfahbod (@behdadesfahbod) on X

@drjtwit If a bad actor gains write access to a repo and changes the release artifacts (eg. tarballs), there's no way to know... I want a page showing who and when and what was uploaded for a release.

X (formerly Twitter)
@hanno This isn't airtight by any means, but one way to begin addressing this concern is to automate release processes to run on public CI. I didn't do this for a long time because I didn't see how to automate the automatable parts without losing human oversight of the parts that needed it — eventually decided I needed to write my own tool to make that possible. Once I figured out the formalism that I wanted (https://pkgw.github.io/cranko/book/latest/jit-versioning/), I've never looked back.
Just-in-Time Versioning - The Cranko Manual

A manual for the Cranko release automation tool.