A flaky test is a test that passes sometimes and fails sometimes, even though no code has changed.
The root cause of flaky tests is some sort of non-determinism, either in the test code or in the application code.
In order to understand why a CI test run is more susceptible to flakiness than a local test run, we can go through all the root causes for flakiness one-by-one and consider how a CI test run has a different susceptibility to that specific flaky test cause than a local test run.
The root causes we’ll examine (which are all explained in detail in this post) are leaked state, race conditions, network/third-party dependency, fixed time dependency and randomness.
Leaked state
Sometimes one test leaks some sort of state (e.g. a change to a file or env var) into the global environment which interferes with later tests.
The reason a CI test run is more susceptible to leaked state flakiness is clear. Unlike a local environment where you’re usually just running one test file at a time, in CI you’re running a whole bunch of tests together. This creates more opportunities for tests to interfere with each other.
Race conditions
A race condition is when the correct functioning of a program depends on two or more parallel actions completing in a certain sequence, but the actions sometimes complete in a different sequence, resulting in incorrect behavior.
One way that race conditions can arise is through performance differences. Let’s say there’s a process that times out after 5000ms. Most of the time the process completes in 4500ms, meaning no timeout. But sometimes it takes 5500ms to complete, meaning the process does time out.
It’s very easy for differences to arise between a CI environment and a local environment in ways that affect performance. The OS is different, the memory and processor speed are different, and so on. These differences can mean that race conditions arise on CI that would not have arisen in a local environment.
Network/third-party dependency
Network dependency can lead to flaky tests for the simple reason that sometimes the network works and sometimes it doesn’t. Third-party dependency can lead to flaky tests because sometimes third-party services don’t behave deterministically. For example, the service can have an outage, or the service can rate-limit you.
This is the type of flakiness that should never occur because it’s not a good idea to hit the network in tests. Nonetheless, I have seen this type of flakiness occur in test suites where the developers didn’t know any better.
Part of the reason why CI test runs are more susceptible to this type of flakiness is that there are simply more at-bats. If a test makes a third-party request only once per day locally but 1,000 times per day on CI, there are of course more chances for the CI request to encounter a problem.
Fixed time dependency
There are some tests that always pass at one time of day (or month or year) and always fail at another.
Here’s an excerpt about this from my other post about the causes of flaky tests:
This is common with tests that cross the boundary of a day (or month or year). Let’s say you have a test that creates an appointment that occurs four hours from the current time, and then asserts that that appointment is included on today’s list of appointments. That test will pass when it’s run at 8am because the appointment will appear at 12pm which is the same day. But the test will fail when it’s run at 10pm because four hours after 10pm is 2am which is the next day.
CI test runs are more susceptible to fixed-time-dependency flakiness than local test runs for a few reasons. One is the fact that CI test runs simply have more at-bats than local test runs. Another is that the CI environment’s time zone settings might be different from the local test environment. A third reason is that unlike a local test environment which is normally only used inside of typical working hours, a CI environment is often utilized for a broader stretch of time each day due to developers kicking off test runs from different time zones and from developers’ varying schedule habits.
Randomness
The final cause of flaky tests is randomness. As far as I know, the only way that CI test runs are more susceptible to flakiness due to randomness is the fact that CI test runs have more at-bats than local test runs.
Takeaways
- A flaky test is a test that passes sometimes and fails sometimes, even though no code has changed.
- The root cause of flaky tests is some sort of non-determinism, either in the test code or in the application code.
- Whenever flakiness is more frequent in CI, the reason is because some difference between the CI test runs and the local runs make flakiness more likely. When flakiness is more likely, it’s because one of the specific five causes of flaky tests has been made more likely.
I run my tests more often in development than on CI, so I don’t think the number of “at-bats” is a factor in why tests flake more often in CI.
If a test flakes locally I track down the cause and fix it by repeatedly running the test with the same random seed. If it fails on CI, and it had previously succeeded locally, I just rerun CI. I think that’s the major cause for me.