I have a story to tell that is relevant to the xz-utils thing that just happened. I'll probably write this up properly later, but I'm in pre-vacation mode so it may take a while . We have a problem with the way we develop and then distribute FOSS software, and both stories show that. A while ago I looked at the testcases of a widely used library implementing a widely used data format. There was one file that was... strange. 🧵
That file was named similar to the other testcases, but it was not used in any test. And if you fed that file into anything using that library, it would either crash or cause enormous CPU spikes. And most interestingly: This file was nowhere to be found in the project's git repository. It was *only* in the tarball.
I contacted the responsible project, but I never got an answer and never really got to the bottom of this. But here's what I think happened: This was a proof of concept file for a yet unfixed and undisclosed vulnerability. It appears the developer already had a testcase for that bug in his local copy of the source tree. And then created the tarball from that source tree. And by doing that leaked a PoC for a zeroday. FWIW, it was "only" a DoS bug. But still.
I wanted to disclose this eventually, but then a new version of that library came out and fixed the bug. And plenty of others, and well, people crash parsers for data formats from hell all the time. And I had some concerns that it would sound like I wanted to ridicule the dev, which wasn't my intention at all. But I already thought there's a deeper story here than someone accidentally leaking a PoC for an unfixed vuln. Why can this even happen?
Pretty much everyone develops code using Git these days, or some other SCM (some don't, there's this mail server, but I disgress). But people distribute code in tarballs. How does a Git repo become a tarball? The answer may disturb you. It's basically "every dev has some process, maybe some script, maybe some commands they remember". Nothing is reproducible, nothing is verifiable.
This creates a situation where even when the "many eyes" principle works, i.e. people are actually looking at the code, and at code changes and commits, you still have a path to a compromised package. Because noone checks how this git repo turns into a tarball. Because noone can, as nothing is standardized or reproducible. I can tell noone does for one of the most important libraries to parse one of the most important data formats, because of the story I just told you.
There were some substantial efforts to create "reproducible builds" in some areas. This is closely related, but not exactly the same thing. Even if we have reproducible builds, we don't have "reproducible source distribution". We should have that. Git already has some cryptographic integrity, and as much as it has some flaws (sha1...), it's a lot better than nothing at all. But we don't connect any of that to the actual source tarballs.
I think the same issue is true for most package managers out there. I don't think there's any mechanism that ties e.g. what's on pypi to what is in any git repo. (Anyone knows if any package manager does that?)
@hanno I don't know rpm or pacman but debian/ubuntu tie the package to a git tag which is a pointer to a specific commit. But git tags are shoddy design and are not immutable. Delete tag alter code recreate tag with same name. You'd have to grab tagged source build then compare hash to know source did not match. 100 to 1 that almost never happens. But massively more secure than windows apple android package handling so not true concern. But is weakness in git design.
@smxi I am pretty sure debian does not pull git sources for most of their packages.
@hanno I'm not sure where you got that idea since that's exactly what they do. I talk to #unit193 the #debian #ubuntu #inxi packager all the time and that's exactly what he does. I can double check with him if you want. Arch pacman packagers certainly pull from gitt then build. AUR is just direct live build scripts pilling from git. Rpm I don't follow but assume that's what they do. Unit193 has tracker script to alert on new tagged releases. I'll ask what he does now.
@smxi @hanno there is no hard and fast rule for Debian packages and where the orig.tar.gz tarball come from. Ideally these match what was released by upstream, but sometimes the Debian maintainer even repacks the tarball so it doesn't even match what upstream released. And the upstream tarball is rarely a git tag - more often than not it is the output of running 'make distcheck' against the git repo and like Hanno says this may then include other bits.

@alexmurray @hanno while fuzzy on details I only started tagging because packagers said distro required it. Been too long but I want to say fedora asked. Maybe arch but don't remember. The point being reproducible builds based on a specific git commit.

Waiting to hear from packagers, now I'm curious. Lintian by FAR strictest packaging guidelines so doubtful any other pm will be more strict

I do #tinycore package for #inxi and they don't use that method at all.