Flaky tests - costs and strategies

Flaky tests - Costs and Strategies^†

Rerun, ignore or fix?

The developer's trust in the correctness of the software depends on the reliability of test results. A test cases that shows two different results on 2 different occasions under identical conditons would be flaky and hence not reliable.

Cost model for flaky tests

The total cost \( C_{\text{total}} \) induced by flaky tests can be expressed as

\[ C_{total} = C_{rerun} + C_{inv} + C_{repair} + C_{delay} + C_{manage} + C_{bugs} \]

where,

\( \begin{aligned} C_{rerun} &= \text{cost of re-running flaky tests,} \\ C_{inv} &= \text{cost of investigating flaky tests,} \\ C_{repair} &= \text{cost of fixing flaky tests,} \\ C_{delay} &= \text{cost of delaying release plans due to flaky tests,} \\ C_{manage} &= \text{cost of managing flaky tests, and} \\ C_{bugs} &= \text{cost of fixing prod bugs due to ignored flaky tests.} \end{aligned} \)

Strategies to deal with flaky tests

Depending on which of the below strategies are chosen, specific cost factors from the above model will be in play.

Heavy-rerun: Re-run the flaky tests many times before reporting a failure. Don't report the flaky tests.
Immediate-fix: Developers investigate and repair flaky tests when they first occur.
Quarantine-test: Flaky test are quarantined/ ignored to prevent them from slowing down the development process.
Investigation-only: Developers are responsible for individually investigating the cause of pipeline failures and deciding whether the change can still be integrated.

Upshot

Considering the costs exerted by flaky tests on the team in terms of chasing after false-positives and potentially delaying releases, it is worth examining whether flaky tests can be avoided and if can't how to deal with them effectively.

Sudhir Shetty, May 31 2026.

† Reference

[2024] Cost of Flaky Tests in Continuous Integration: An Industrial Case Study