carloscastilla - Fotolia


Why flaky tests are a problem you can't ignore

Monotonous as the task might be, QA must detect and fix the root causes of flaky tests. Invest the time and effort, or risk the dangers of an unreliable test automation suite.

Here's a software QA riddle: What kind of test sometimes fails and sometimes passes? A bad one.

Flaky tests, also referred to as nondeterministic, are automated tests that pass or fail seemingly without cause, even when run with the same configuration, without changes to code, data or environment. These faulty assessments, which typically appear in integration- and GUI-level tests, can diminish confidence in the entire automation suite.

While flaky tests are not necessarily indicative of a defect, they are problematic and difficult to diagnose. Developers and QA professionals find identifying and fixing flaky tests to be some of the most tedious tasks related to automated test suite maintenance. When one occurs, the team might ignore it altogether, rather than sink time into investigation and determining whether or not the bug is real. And not all nondeterministic tests flake equally; certain flaky tests should take precedence over others.

Testers can struggle to find the root cause of a flaky test in a timely manner. However, nondeterministic  tests do have some common causes. And particular variables can tell QA professionals what flaky tests to prioritize and even which assessments to replace. QA teams can take practical approaches to identify flaky tests and fix the underlying issues.

What causes flaky tests?

Often, tests end up flaky because of how you wrote them. In UI testing, QA can use asynchronous calls to load dynamic data, but those tests can get flaky when testers use sleep functions. These calls might also increase the run time of the test. Use callbacks or polling to mitigate some of these issues.

Stale data from caching can also cause flaky tests, as can setup and cleanup issues, such as when the test environment does not return to its original state after a run. Tests that require a current time, or to gather events throughout the day, can become flaky when they run in different time zones, so consider time scenarios when designing them.

Flakiness is not always a result of how a developer codes a test. Infrastructure issues -- node failures, unreliable network connections, database failures and bugs in the automation framework -- also throw off test results. Testing with third-party systems often contributes to flaky tests, as these environments are not under your control. If possible, stub the integrating systems to ensure the initial tests run deterministically before you run them against a third-party environment.

Which flaky tests to prioritize

All flaky tests take time to investigate and fix. Some flaky tests wreak more havoc than others, so QA teams should prioritize which ones they work on. Here are several criteria to help you prioritize flaky tests.

Level of business risk. Determine how much business risk a flaky test poses. Focus on tests that validate critical business workflows.

If a problematic test validates a feature that customers seldom use, then fixing it should be a low priority. During test maintenance, consider simply removing that flaky test. You might instead replace it with a new one.

Test timing. Test quality is most important for business-critical features. Prioritize the high-value but flaky tests that take place at key points in the release cycle. For example, nondeterministic tests in a continuous testing suite that disrupt a CD pipeline can affect release velocity.

Amount of effort required. Flaky tests are notoriously difficult to diagnose and fix. Many factors could contribute to the root cause of a test's flakiness. Remediation might be too much work. Instead, remove the test and design a new one.

Fix flaky tests

Once you identify flaky tests, isolate them from your reliable ones, and quarantine them in a separate test suite. Just one flaky test can potentially contaminate the entire suite, especially if tests are not autonomous. Flaky tests also cause bottlenecks in the CI pipeline if they're not removed.

This separation -- without eliminating the flaky test entirely -- helps ensure that, when a test fails, the team will investigate the result as a potential defect. Don't just ignore the results of flaky tests. If you fail to fix or replace quarantined tests, you will open gaps in regression coverage.

Create fewer flaky tests

When you improve your ability to write test cases, nondeterministic tests will become less of a problem. To design effective tests, pay close attention to the components that make up test cases.

Additionally, learn when it's smart to automate test cases. Be aware of how testable a piece of software is, and always look for easier ways to test it. A more efficient test suite makes flaky test remediation easier.

Eliminate the most obvious external causes of nondeterministic results and rerun the test with a clean environment and system state. Then, stub third-party applications. Once you eliminate external issues, examine the automation script for issues with concurrency, time and asynchrony; update as needed. A tool like DeFlaker or pytest can accelerate the analysis process.

Next Steps

How to perform test automation maintenance

Dig Deeper on Software testing tools and techniques

Cloud Computing
App Architecture