Mastodawn

I have a test that generates random numbers as input for an algorithm. I can verify some general properties that always hold (“all the input is accounted for”) but I can also test some conditions that are probabilistically true (this should be at least 10, to 5σ). Thoughts?

Add this to CI

46.7%

“Flaky” tests are not OK

53.3%

Poll ended at Aug 17, 2024 at 4:00am.

Saagar Jha Aug 16, 2024

(To be clear, the goal here is that there are some outputs that are very very unlikely and probably indicate a bug in my code rather than something that occurred naturally. But they *can* happen. Think “this sort algorithm never moved any elements”–fails if input was sorted.)

Miguel Arroz Aug 16, 2024

@saagar When the test "fails", can you verify is the input is a special case? Having tests with random input for complex algorithms is definitely useful, to catch stuff no one thought about. Of course, it's useful… if you don't forget to print the input to the logs. ;)

Saagar Jha Aug 16, 2024

@arroz In this case verifying that the input is special would be approximately the same as running the algorithm. Part of the reason for these test cases is that I could always compare against a second implementation but I don’t want to write one or steal it

Miguel Arroz Aug 16, 2024

@saagar Yeah, that doesn't help much. In the sorting example, verifying the special condition (sorted input) is trivial compared to sorting. But if the verification is as complex as the algo, you may be introducing errors on the tests as well.

Having a test suite with tests that fail occasionally in situations that are not really a test failure isn't great. You'll quickly get used to ignore them and you will eventually miss a real failure.

Saagar Jha Aug 16, 2024

@arroz To be clear I can make it so that “occasionally” is “once in a thousand years” if I want

Dominic Hopton Aug 16, 2024

@saagar @arroz I assume controlling the seed for the randomness isn’t applicable so you have stable generation?

Saagar Jha Aug 16, 2024

@grork @arroz Yeah it’s an option: https://federated.saagarjha.com/notice/Al1ijwVP4kKVx8gWvY

Saagar Jha (@[email protected])

@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the altern...

Paul Khuong Aug 16, 2024

@saagar at some point the probability of failure is dominated by the probability of CPU misbehaviour or data corruption. "No flake" is a physical impossibility.

Saagar Jha Aug 16, 2024

@pkhuong Fair enough lol

Tamir Bahar Aug 16, 2024

@saagar
I prefer clear indicators in CI, because something that _sometimes_ fails the CI is frustrating.
When things are probabilistic (performance measurements, in my case) I tend to plot them on a graph, so that I can look at it every now and then and see if the trend is bad, or if the bad results persists rather than being a one-time thing.

George Nachman Aug 16, 2024

@saagar seed the prng so it’s a deterministic test. If you change the code it might give a false fail but that’s better than no test and it won’t be flaky

Saagar Jha Aug 16, 2024

@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the alternative is not testing at all this is probably better.

George Nachman Aug 16, 2024

@saagar if you give it enough iterations it’s still testing “a lot”. This is probably more for emotional support than correctness but it’s harmless

Wowfunhappy Aug 16, 2024

@saagar Would it make sense to try rerunning the test a few times on getting the unlikely result? I figure something which is merely unlikely to happen once becomes practically impossible to occur several times in a row.

Saagar Jha Aug 16, 2024

@Wowfunhappy That’s just driving the probability down. I can already do that in the test itself by picking an appropriate problem size

Wowfunhappy Aug 16, 2024

@saagar But there's a point at which the probability is so infinitesimally small that it's irrelevant, right? I mean the test could also be messed up by cosmic rays flipping bits (perhaps so many in the wrong places that ECC doesn't save you), but you don't worry about that either.

If the chance of anyone ever running into this is smaller than, say, you winning the lottery jackpot for several weeks in a row, then stop worrying about it.

Saagar Jha Aug 16, 2024

@Wowfunhappy This would be the second option in the poll