I accidentally found a security issue while benchmarking postgres changes.
If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.
I accidentally found a security issue while benchmarking postgres changes.
If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.
I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.
Really required a lot of coincidences.
One more aspect that I think emphasizes the number of coincidences that had to come together to find this:
I run a number "buildfarm" instances for automatic testing of postgres. Among them with valgrind. For some other test instance I had used -fno-omit-frame-pointer for some reason I do not remember. A year or so ago I moved all the test instances to a common base configuration, instead of duplicate configurations. I chose to make all of them use -fno-omit-frame-pointer.
Afaict valgrind would not have complained about the payload without -fno-omit-frame-pointer. It was because _get_cpuid() expected the stack frame to look a certain way.
Additionally, I chose to use debian unstable to find possible portability problems earlier. Without that valgrind would have had nothing to complain.
Without having seen the odd complaints in valgrind, I don't think I would have looked deeply enough when seeing the high cpu in sshd below _get_cpuid().
@gordonmessmer @HydrePrever @AndresFreundTec To be fair, test failures being considered "Flaky" isn't unheard of - as soon as I read that, I immediately thought of this StackExchange question: [ https://softwareengineering.stackexchange.com/questions/448510/how-to-get-flaky-tests-fixed-after-having-mitigated-their-flakiness ].
What was the over/under on someone saying "Yeah, that test just fails sometimes. We don't know why though. Run it 8 times, and if it fails more than 3, then it's probably *actually* an issue.", and *that* helping to get it past the radar of scrutiny?
Recently, I was charged with making about 9000 Selenium tests start running in CI/CD nightly. These tests had built up over about 8 years and had up until then been run in an ad hoc way. It was imp...
@AndresFreundTec That was more than just good - you probably stopped a devastating attack on the whole industry. The open source community owes you a huge debt of gratitude. Had that exploit gotten into the wild it would have been awful. Catching it early was an immense win.
I hope Microsoft recognizes how much of a contribution you made to the entire industry.
@AndresFreundTec Hey dude, you really save Millions, thanks a lot.
(Luck or not you did a fantastic job).