Mastodawn

Saagar Jha Aug 16, 2024

I have a test that generates random numbers as input for an algorithm. I can verify some general properties that always hold (“all the input is accounted for”) but I can also test some conditions that are probabilistically true (this should be at least 10, to 5σ). Thoughts?

Add this to CI

46.7%

“Flaky” tests are not OK

53.3%

Poll ended at Aug 17, 2024 at 4:00am.

Show thread

Saagar Jha

(To be clear, the goal here is that there are some outputs that are very very unlikely and probably indicate a bug in my code rather than something that occurred naturally. But they *can* happen. Think “this sort algorithm never moved any elements”–fails if input was sorted.)

Show thread

Miguel Arroz Aug 16, 2024

@saagar When the test "fails", can you verify is the input is a special case? Having tests with random input for complex algorithms is definitely useful, to catch stuff no one thought about. Of course, it's useful… if you don't forget to print the input to the logs. ;)

Show thread

Saagar Jha Aug 16, 2024

@arroz In this case verifying that the input is special would be approximately the same as running the algorithm. Part of the reason for these test cases is that I could always compare against a second implementation but I don’t want to write one or steal it

Show thread

Miguel Arroz Aug 16, 2024

@saagar Yeah, that doesn't help much. In the sorting example, verifying the special condition (sorted input) is trivial compared to sorting. But if the verification is as complex as the algo, you may be introducing errors on the tests as well.

Having a test suite with tests that fail occasionally in situations that are not really a test failure isn't great. You'll quickly get used to ignore them and you will eventually miss a real failure.

Show thread

Saagar Jha Aug 16, 2024

@arroz To be clear I can make it so that “occasionally” is “once in a thousand years” if I want

Show thread

Dominic Hopton Aug 16, 2024

@saagar @arroz I assume controlling the seed for the randomness isn’t applicable so you have stable generation?

Show thread

Saagar Jha Aug 16, 2024

@grork @arroz Yeah it’s an option: https://federated.saagarjha.com/notice/Al1ijwVP4kKVx8gWvY

Saagar Jha (@[email protected])

@gnachman I guess this could work. I was against seeding the PRNG because it’s always testing the same thing, which kind of negates the point of testing against novel random data. But if the altern...

Show thread

Paul Khuong Aug 16, 2024

@saagar at some point the probability of failure is dominated by the probability of CPU misbehaviour or data corruption. "No flake" is a physical impossibility.

Show thread

Saagar Jha Aug 16, 2024

@pkhuong Fair enough lol