I have a test that generates random numbers as input for an algorithm. I can verify some general properties that always hold (“all the input is accounted for”) but I can also test some conditions that are probabilistically true (this should be at least 10, to 5σ). Thoughts?
Add this to CI
46.7%
“Flaky” tests are not OK
53.3%
Poll ended at .
(To be clear, the goal here is that there are some outputs that are very very unlikely and probably indicate a bug in my code rather than something that occurred naturally. But they *can* happen. Think “this sort algorithm never moved any elements”–fails if input was sorted.)
@saagar When the test "fails", can you verify is the input is a special case? Having tests with random input for complex algorithms is definitely useful, to catch stuff no one thought about. Of course, it's useful… if you don't forget to print the input to the logs. ;)
@arroz In this case verifying that the input is special would be approximately the same as running the algorithm. Part of the reason for these test cases is that I could always compare against a second implementation but I don’t want to write one or steal it

@saagar Yeah, that doesn't help much. In the sorting example, verifying the special condition (sorted input) is trivial compared to sorting. But if the verification is as complex as the algo, you may be introducing errors on the tests as well.

Having a test suite with tests that fail occasionally in situations that are not really a test failure isn't great. You'll quickly get used to ignore them and you will eventually miss a real failure.

@arroz To be clear I can make it so that “occasionally” is “once in a thousand years” if I want
@saagar @arroz I assume controlling the seed for the randomness isn’t applicable so you have stable generation?
Saagar Jha (@[email protected])

@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the altern...

@saagar at some point the probability of failure is dominated by the probability of CPU misbehaviour or data corruption. "No flake" is a physical impossibility.
@saagar
I prefer clear indicators in CI, because something that _sometimes_ fails the CI is frustrating.
When things are probabilistic (performance measurements, in my case) I tend to plot them on a graph, so that I can look at it every now and then and see if the trend is bad, or if the bad results persists rather than being a one-time thing.
@saagar seed the prng so it’s a deterministic test. If you change the code it might give a false fail but that’s better than no test and it won’t be flaky
@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the alternative is not testing at all this is probably better.
@saagar if you give it enough iterations it’s still testing “a lot”. This is probably more for emotional support than correctness but it’s harmless
@saagar Would it make sense to try rerunning the test a few times on getting the unlikely result? I figure something which is merely unlikely to happen once becomes practically impossible to occur several times in a row.
@Wowfunhappy That’s just driving the probability down. I can already do that in the test itself by picking an appropriate problem size

@saagar But there's a point at which the probability is so infinitesimally small that it's irrelevant, right? I mean the test could also be messed up by cosmic rays flipping bits (perhaps so many in the wrong places that ECC doesn't save you), but you don't worry about that either.

If the chance of anyone ever running into this is smaller than, say, you winning the lottery jackpot for several weeks in a row, then stop worrying about it.

@Wowfunhappy This would be the second option in the poll