I am seeing up to 30% wall-time fluctuation running the same code on the same data on the same VM type in GCE.
That seems crazy to me.
What's the worst variation you've seen? How do you deal with these things?
I mean, for any benchmarking you need to run the clickhouse tomato benchmark protocol? (E.g. run two instances of the same software on the same VM so they're exposed to the same noisy neighbor effects).