I am seeing up to 30% wall-time fluctuation running the same code on the same data on the same VM type in GCE.

That seems crazy to me.

What's the worst variation you've seen? How do you deal with these things?

I mean, for any benchmarking you need to run the clickhouse tomato benchmark protocol? (E.g. run two instances of the same software on the same VM so they're exposed to the same noisy neighbor effects).

@HalvarFlake oh boy don't get me started, I had to build a benchmarking system for GitLab and we had to buy sole tenants, the variablity of noisey neighbors is just too high to do any realistic benchmarking.
@HalvarFlake very in line with what we measured, heavily follows rush hour per geographic area. Yet another advantage for on-prem.
@HalvarFlake even measuring latency in a busy VM is not reliable, as usually time drift is amortized by hypervisors as we found out 10 years ago with our benchmark team so wall-clock lies in VMS... If you need the 99p bare metal is your only real option. But do you really need it? Most want it (as we promised our clients) but don't really need it(as something fails hard) though

@HalvarFlake link from back then, don't know how accurate this still is (2008)
Look at the tick counting section for an overview of the problem

https://www.vmware.com/docs/vmware_timekeeping