I accidentally found a security issue while benchmarking postgres changes.

If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.

https://www.openwall.com/lists/oss-security/2024/03/29/4

oss-security - backdoor in upstream xz/liblzma leading to ssh server compromise

I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.

Really required a lot of coincidences.

One more aspect that I think emphasizes the number of coincidences that had to come together to find this:

I run a number "buildfarm" instances for automatic testing of postgres. Among them with valgrind. For some other test instance I had used -fno-omit-frame-pointer for some reason I do not remember. A year or so ago I moved all the test instances to a common base configuration, instead of duplicate configurations. I chose to make all of them use -fno-omit-frame-pointer.

Afaict valgrind would not have complained about the payload without -fno-omit-frame-pointer. It was because _get_cpuid() expected the stack frame to look a certain way.

Additionally, I chose to use debian unstable to find possible portability problems earlier. Without that valgrind would have had nothing to complain.

Without having seen the odd complaints in valgrind, I don't think I would have looked deeply enough when seeing the high cpu in sshd below _get_cpuid().

There are more coincidences that are even less interesting. But even the above should make it clear how unlikely it was that I found this thing.
Just to be clear: I didn't mean that I didn't do good - I did. I mean that we got unreasonably lucky here, and that we can't just bank on that going forward.
@AndresFreundTec but there are thousands of devs running all sorts of tests. The question is not "what were the odds that you missed it", but "what were the odds that everybody missed it"...
@HydrePrever @AndresFreundTec The odds that everyone missed it are extremely high. It actually triggered a bunch of test failures, which the attacker explained away as ifunc side effects, and a lot of people simply accepted that explanation.