Flaky tests are tests that can pass and fail in repeated executions without changes to the test code or the code under test. A resource-affected flaky test is one which has a statistically different failure rate when compute resources are constrained compared to unconstrained.
Flaky tests are deterimental to a developer's productivity since they can be misled to look for the cause of a fail in their most recent code changes whereas the cause could be any number of other reasons such as thread interleavings, test ordering and others.
A 2024 paper found that test flakiness can be attributed to un-availability of compute resources 46.5% of the time.
The results were found by running each test in the test-suites belonging to 52 open source projects written in Java, JavaScript and Python, 300 times, under each of the config in the tables above.
The results derived by running tests through resource configs defined in Table 1 are
- Constraining resources caused 46.5% of all flaky tests to be flaky.
- Of those 47.5% flaky tests, on a 0-300 scale of resource-sensitivity to flakiness, most lay in the 1-50 range.
- The resource which triggered the most flakiness was CPU.
- There was no one particular resource config which triggered the most flakiness.
The results derived by running tests through resource configs defined in Table 2 is
- The most cost-effective config to detect/ prevent flakiness depends on the project.
Points to note/ ponder
- Configurations with a lower hourly rate can take longer to complete due to limited resources, resulting in a higher cost for each build compared to an expensive but fast configuration.
- Developers try to reduce flakiness by modifying the test as compared to increasing the resource.
- Making tests which run on over-powered resource run on appropriate lower-powered resource can increase throughput/ lower cost.