The developer's trust in the correctness of the software depends on the reliability of test results. A test cases that shows two different results on 2 different occasions under identical conditons would be flaky and hence not reliable.
Cost model for flaky testsThe total cost \( C_{\text{total}} \) induced by flaky tests can be expressed as
Depending on which of the below strategies are chosen, specific cost factors from the above model will be in play.
- Heavy-rerun: Re-run the flaky tests many times before reporting a failure. Don't report the flaky tests.
- Immediate-fix: Developers investigate and repair flaky tests when they first occur.
- Quarantine-test: Flaky test are quarantined/ ignored to prevent them from slowing down the development process.
- Investigation-only: Developers are responsible for individually investigating the cause of pipeline failures and deciding whether the change can still be integrated.
Considering the costs exerted by flaky tests on the team in terms of chasing after false-positives and potentially delaying releases, it is worth examining whether flaky tests can be avoided and if can't how to deal with them effectively.