@tomrittervg @fluidlogic @andrew_shadura @trysdyn as a hypothetical, how would you adjust the telemetry so an adversary who could read your data couldn’t fingerprint your users with it?
I’d want to simulate it first, but I suspect from my work with OpenTelemetry it’d hurt less than one’d at first suspect to send a subset of signals with each ping? Send the random bitmask used for selection along with the data in case some queries should only consider samples with all metrics of interest present.
That’d augment basic sampling. You’re no doubt familiar but, for those in the back: we can draw operationally useful and statistically defensible conclusions as long as we have enough samples per interval. If N=100K is strong enough, collecting more embiggens our cloud bills without any improvement in our results. Instead, we can throw out 90%… or 99%, or more. Fun!
If you have 100M users, it might suffice to have each user send a ping only on 0.1% of the intervals. Or, to make fingerprinting harder, only 10% of the signals on 1% of the intervals? Maybe some, you simply don’t ever send in combination with others? Might be a paper or two to be had in that.