I have a test that generates random numbers as input for an algorithm. I can verify some general properties that always hold (“all the input is accounted for”) but I can also test some conditions that are probabilistically true (this should be at least 10, to 5σ). Thoughts?
Add this to CI
46.7%
“Flaky” tests are not OK
53.3%
Poll ended at .
(To be clear, the goal here is that there are some outputs that are very very unlikely and probably indicate a bug in my code rather than something that occurred naturally. But they *can* happen. Think “this sort algorithm never moved any elements”–fails if input was sorted.)
@saagar When the test "fails", can you verify is the input is a special case? Having tests with random input for complex algorithms is definitely useful, to catch stuff no one thought about. Of course, it's useful… if you don't forget to print the input to the logs. ;)
@arroz In this case verifying that the input is special would be approximately the same as running the algorithm. Part of the reason for these test cases is that I could always compare against a second implementation but I don’t want to write one or steal it

@saagar Yeah, that doesn't help much. In the sorting example, verifying the special condition (sorted input) is trivial compared to sorting. But if the verification is as complex as the algo, you may be introducing errors on the tests as well.

Having a test suite with tests that fail occasionally in situations that are not really a test failure isn't great. You'll quickly get used to ignore them and you will eventually miss a real failure.

@arroz To be clear I can make it so that “occasionally” is “once in a thousand years” if I want
@saagar @arroz I assume controlling the seed for the randomness isn’t applicable so you have stable generation?
Saagar Jha (@[email protected])

@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the altern...

@saagar at some point the probability of failure is dominated by the probability of CPU misbehaviour or data corruption. "No flake" is a physical impossibility.