End-to-End Testing: Fixing a Flaky Test and Avoiding Sleeps with Playwright
Explore a flaky test to uncover how a Web App's implementation can contribute to flakes. Uncover a user-facing bug in a Web App, learn what Playwright does to ensure reliable tests, and explore how to avoid sleeps to keep your tests fast and flake-free!
What to Expect
This post assumes you have some experience writing and running tests with Playwright Test.
In this post, I'll define what flaky means, and why it can be so frustrating. I'll share a minimal (buggy) example Web App along with a flaky test case. I'll explain the root cause, and how to fix the app (or the test in the case the App's behavior cannot be changed). I'll explain why it's important to avoid await page.waitForTimeout(…)
and other forms of static sleeps.
What's a Flaky Test?
A test that both passes and succeeds without any change to the test or code itself is considered flaky. The test might work locally, and then immediately fail in a CI environment, or it may take a few CI runs to see it fail unexpectedly—and then pass again—and then fail—and then…you get the point!
In general, the outcome of a test can be impacted by:
- the Test Runner itself (but hopefully not!)
- the test's implementation
- the Web App's frontend code
- constraints of the backend (e.g. rate-limited, slow or unreliable REST API)
- CPU and memory load on the machine (e.g. CI) running the test
- and more…
When a test flakes, it's frustrating since it's not clear where the issue is and what needs to be fixed:
- Will the Web App's users also face the same issue and be unhappy? 😕
- Is it a bug in the App's code?
- Is it an improperly written test case?
- Did my commit/change/Pull Request just break something? Or was it an earlier change? The tests passed locally before I pushed!?!
- Is it the CI system?
- The Test Runner?
- A time-glitch in the universe?! 💥
An Example
Below is a minimal Web App I'll be testing. Click the to load (or reload the sample app), and then click around to use it. Each time you click the , it's like reloading the page/app from the start. (You can also click the to open the App itself in a seperate tab.)
This App allows a user to view their club membership by selecting an account number. Data is loaded asynchonrously, and like many apps, the time it takes to load the data changes.
Try it out a few times in a row!
- Click .
- Immediately select account
B-002
from the dropdown. - Observe the info (The Coding Club was started in 1970. Account # is B-002.) is displayed…or maybe something else happens. 😀
- Repeat this a few times. It might not always work! (Afterall, we're looking at a buggy app and learning about flakes!)
The below animation shows what you might've experienced if you tried out the live demo repeatedly:
- ✅ On the left side, in this particular run, once the UI is shown and the user immediately selects B-002, the expected account info is shown.
- ❌ However, on the run in the right side, once the UI is shown and the user immediately selects B-002, they get an error. 😢
The Test
Here's what the manual steps above look like as a Playwright Test:
Verifying a Flake
You can run the test many times in a row (via Playwright's --repeat-each
flag), to see the flake, too:
Debugging
After viewing some Playwright Traces and running a few times in debug mode, I see an issue. It looks like the option is selected before the Web App has fully loaded the requisite data asynchronously.
Sometimes, the in-app visual loading indicator reports it's still loading data (e.g. it says Loading… and not yet Loaded!) while I've already had an opportunity to click.
The Web App is instrumented with some logging, and if I forward its logs to the terminal:
I can see the race, too:
In this specific case, the flaky behavior becomes apparent quickly. In other cases, the issue may not be visually apparent, or the data loads quick enough frequently enough that the issue goes unnoticed. However, think of Playwright as your fastest ⚡️ user (ever!), so it can help highlight races and bugs.
Limits of Auto-Waiting
Since Playwright is the fastest ⚡️ user (ever!), it automatically performs waiting and actionability checks under the hood. This keeps the API nice to use, and your tests fast. Instead of time-based waiting, it performs checks and waits on events to fire in the browser, so it can move on as soon as requirements are met without delaying arbitrarily.
For example, await page.goto(…)
not only waits for an HTTP Response for the main page, but it also waits for the load
event. When clicking, selecting, and otherwise interacting with elements on the page Playwright performs actionability checks on the elements to ensure it only does so at the first opportunity that makes sense (e.g. ensures the Locator and the element it resolves to are actually visibile, clickable, and enabled). Therefore, writing:
should be enough (under nearly all) circumstances.
So, while it might appear something is not working with Playwright based on this information, we'll see in a bit (spoiler), I actually forgot to initially set the disabled
attribute on the <select>
while the page (after DOMContentLoaded
) hydrated asynchronously with data, so both the App's users and Playwright accidentally could interact with the control before it was ready!
Why Not Use page.waitForTimeout(…)?
It looks like the data is always loaded if we wait 10 seconds before selecting the option, so it might be tempting to "fix" this test by doing:
However, this creates more problems:
- It covers up a bug in our app. Are we going to tell the user to wait for 10 seconds before interacting with the app?
- It unnecessarily slows down every test. When our test passes, it can pass as quickly as <500ms…now if we force Playwright to wait for 10 seconds each time we've slowed all runs of this test down by up to 20X. This adds up over many tests slowing down the whole suite. Don't sprinkle in sleeps/waits!
- It's likley to still flake in the future. Today, 10 seconds might be enough, but maybe when the App is under load, it would need 11 seconds or more. In CI, where there aren't many CPUs available, things might run slower, too.
Workaround
In the case that we can't change the app itself (maybe there's no release for a while), we can use a Web-First Assertions to help Playwright wait on another signal:
Instead of using a fixed timeout, Playwright will only wait long enough until the UI has the "Loaded!" message in it that the user can also see!
This stabilizes the test, but it does cover up a bug in the app itself since a user can still click without waiting. It's not ideal, but it's an option if the app under test cannot be changed. (Maybe you are using a third party UI component.)
If we run the above (with --repeat-each 20
), we see it's now reliable (and quick)! Not only do all the tests pass, but most take <500ms while a few, where the data takes longer to load, take a few seconds—but none take 10 seconds!
In our demo app, there's a visual text indicator we can use to workaround the flake; however, depending on what you are trying to workaround or fix, you might instead wait for a different ready signal like Network Response coming from your API, or (under advanced circumstances) a Promise
or other signal from within your app via await page.waitForFunction(…)
or await expect.poll(…)
.
Actual Fix
In this case, if we fix our app itself, simply by putting a disabled
attribute on the <select disabled>
until the App is actually ready for the user to start selecting, none of the above workarounds are required, and Playwright's actionability checks take care of the work for us. Most importantly, we're fixing a bug in our app that a user could hit.
The diff on the Web App code:
The deployed update:
If you reload it a few times, in the case that it takes a while to load data, you can observe the <select>
is disabled letting you know it's not interactable.
Playwright won't click on it until it's enabled, and if a user happens to click the disabled element it will be ignored and without an error:
Our original, concise, and idiomatic test should now work (notice how there's no extra await expect(…)
s—we just go to take our action):
If we run the test 100 times, we can see it passes every time and that much of the time it can finish quickly <500ms (the App is hydrated quickly), and sometimes it takes a few seconds (when the App takes longer to load data)—but we did not explicitly specify a time:
Awesome! 🎉
Additional Notes
Web Apps—between their complex frontend ends, backend, and CI systems—sometimes have flakes every now and then that are unavoidable or not worth debugging. In these cases, you can consider using Playwright's built in Retry settings to automatically retry tests.
However, this can cover up actual issues, so be sure to use this sparingly and/or monitor which tests are flaky. If one particular test is consistently flaky, it's better to just disable (via test.skip(…)
or test.fixme(…)
): it's either a bad test or something needs to be fixed in the app (like as was the case with our sample app).
If you have a large test suite (e.g. 100s or 1000s of tests), and a different seeminlgly random test flakes each CI run, the retries option can be helpful.
🏠 Home