The supply chain attack on XZ Utils is fascinating. It does not appear to be a hack but rather an inside job. The malicious code has been added by someone who has been co-maintaining the project for the past two years. There is a considerable amount of (presumably) legitimate and non-trivial changes associated with that person. No public changes unrelated to xz however from what I can tell quickly.

Given the effort that went into hiding the backdoor, I’m fairly certain that it was supposed to operate undetected for a long time. It’s probably just luck that someone noticed the side-effects it caused, discovering it merely a month after it was planted.

I’m looking forward to a thorough analysis of the implant, hopefully it will allow conclusions about intentions. As things stand know, this could be a long-term operation by an APT, pushing their maintainer into a popular project which (like way too many open source projects) was constantly short on contributors. Obviously, monetary interests are also a possible explanation.

I see people arguing about this who clearly have no idea about the reality of open source projects. Enforcing code reviews, really? Most open source projects can consider themselves lucky if they have a single reliable contributor. Who is supposed to do these code reviews and where will they get the time?

With most open source projects, a single burst of useful contributions is all you need to be made a co-maintainer (talking from experience). Often enough you will even be offered to become the sole maintainer. The person behind the repository has no time, and they will happily delegate to whoever does.

I see someone suggesting that this backdoor has been built up piecewise over the course of a year. I did not verify, but this would make it a highly sophisticated and stealthy attack. Even with reviews, most open source projects would be unprepared to detect it. That one odd line in the build script standing out? It works, so nobody would bother to dig further.

The more important concern right now is: the same person has been driving xz releases since at least December 2022. It has to be verified that everything before xz 5.6.0 is really clean, otherwise this is very bad.

From the look of it, verifying that xz 5.4.6 for example can be trusted is going to be really tough. With version 5.6.0 or 5.6.1 we already know that the code in the repository and the code in the tarball isn’t identical. So why don’t we download the tarballs for the previous versions and compare them to the repository?

Well, because they generally aren’t identical, never been from what I can tell. They contain a bunch of files generated with autoconf and aclocal. So there is a whole lot of autogenerated code, some of which has been messed with.

As I see it, some code has been added to the configure file after the legitimate code for AM_GNU_GETTEXT. This code invokes build-to-host.m4, a trojanized version of a legitimate script.

There is no such modification in the files for version 5.4.6 for example, but there are still lots of autogenerated files – way more code than can be realistically reviewed manually (and trust me: you don’t want to review this code). So in order to exclude the possibility of other manipulations, someone will need to attempt to reproduce these files with all the right versions of the build tools. And I’m just happy that this someone isn’t going to be me.

I don’t know whether pre-compiling tarballs by running autoconf is common practice. I suspect that it is, given how messy it is to get all the necessary dependencies in place to do it yourself. I would suggest using a reasonable build system but… what can possibly be reasonable about a C codebase in year 2024?

Github took out the big hammer and disabled the entire xz repository and a bunch of others belonging to the project. I fail to see how this is going to help. People have been studying these repositories, looking for clues about what happened and whether we can still trust older versions. Now almost the entire history became inaccessible.

Also, I realized that my statement above about the malicious contributor driving releases since December 2022 is likely incorrect. The date displayed by Github isn’t when the release artifacts were uploaded, it’s rather the date of the release tag. According to Web Archive, xz releases have been moved from Sourceforge to Github somewhere between April 24 and May 6, 2023. This included some of the older releases as well.

@WPalant Most of the history is still accessable via the official mirror: https://git.tukaani.org/?p=xz.git;a=summary
@fdellwing @WPalant PRs and issue comments are also important information that has been lost(?)
@teivel @WPalant That is why I said "most". And regarding PRs: There is really not much to see...he just merged his own PRs.