End-to-End Testing: Fixing a Flaky Test and Avoiding Sleeps with Playwright

What to Expect

This post assumes you have some experience writing and running tests with Playwright Test.

In this post, I'll define what flaky means, and why it can be so frustrating. I'll share a minimal (buggy) example Web App along with a flaky test case. I'll explain the root cause, and how to fix the app (or the test in the case the App's behavior cannot be changed). I'll explain why it's important to avoid await page.waitForTimeout(…) and other forms of static sleeps.

What's a Flaky Test?

A test that both passes and succeeds without any change to the test or code itself is considered flaky. The test might work locally, and then immediately fail in a CI environment, or it may take a few CI runs to see it fail unexpectedly—and then pass again—and then fail—and then…you get the point!

In general, the outcome of a test can be impacted by:

the Test Runner itself (but hopefully not!)
the test's implementation
the Web App's frontend code
constraints of the backend (e.g. rate-limited, slow or unreliable REST API)
CPU and memory load on the machine (e.g. CI) running the test
and more…

When a test flakes, it's frustrating since it's not clear where the issue is and what needs to be fixed:

Will the Web App's users also face the same issue and be unhappy? 😕
Is it a bug in the App's code?
Is it an improperly written test case?
Did my commit/change/Pull Request just break something? Or was it an earlier change? The tests passed locally before I pushed!?!
Is it the CI system?
The Test Runner?
A time-glitch in the universe?! 💥

An Example

Below is a minimal Web App I'll be testing. Click the to load (or reload the sample app), and then click around to use it. Each time you click the , it's like reloading the page/app from the start. (You can also click the to open the App itself in a seperate tab.)

This App allows a user to view their club membership by selecting an account number. Data is loaded asynchonrously, and like many apps, the time it takes to load the data changes.

Try it out a few times in a row!

Click .
Immediately select account B-002 from the dropdown.
Observe the info (The Coding Club was started in 1970. Account # is B-002.) is displayed…or maybe something else happens. 😀
Repeat this a few times. It might not always work! (Afterall, we're looking at a buggy app and learning about flakes!)

The below animation shows what you might've experienced if you tried out the live demo repeatedly:

✅ On the left side, in this particular run, once the UI is shown and the user immediately selects B-002, the expected account info is shown.
❌ However, on the run in the right side, once the UI is shown and the user immediately selects B-002, they get an error. 😢

The Test

Here's what the manual steps above look like as a Playwright Test:

account-viewer.spec.ts

import { test, expect } from "@playwright/test";
 
test("should allow the user to view account details", async ({ page }) => {
  await page.goto("https://pw.rwoll.dev/flaky-submissions/bad.html");
  await page.locator("text=Choose an Account").selectOption("B-002");
  await expect(page.locator("id=info")).toContainText(
    "Coding Club was started in 1970."
  );
});

Verifying a Flake

You can run the test many times in a row (via Playwright's --repeat-each flag), to see the flake, too:

$ npx playwright test --repeat-each 20
[…truncated logs for brevity in post…]
    Error: expect(received).toContainText(expected)
 
    Expected string: "Coding Club was started in 1970."
    Received string: "ERROR: B-002 was NOT found!"
[…truncated logs for brevity in post…]
  7 failed
    tests/account-viewer.spec.ts:3:1 › should allow the user to view account details ===============
  13 passed (8s)

Debugging

After viewing some Playwright Traces and running a few times in debug mode, I see an issue. It looks like the option is selected before the Web App has fully loaded the requisite data asynchronously.

Sometimes, the in-app visual loading indicator reports it's still loading data (e.g. it says Loading… and not yet Loaded!) while I've already had an opportunity to click.

The Web App is instrumented with some logging, and if I forward its logs to the terminal:

import { test, expect } from "@playwright/test";
 
test("should allow the user to view account details", async ({ page }) => {
  page.on("console", (m) => console.log("BROWSER: ", m.text()));
  await page.goto("https://pw.rwoll.dev/flaky-submissions/bad.html");
  await page.locator("text=Choose an Account").selectOption("B-002");
  console.log(">>> MADE SELECTION");
  await expect(page.locator("id=info")).toContainText(
    "Coding Club was started in 1970."
  );
});

I can see the race, too:

BROWSER:  DOMContentLoaded emitted!
BROWSER:  Registering select listener…
BROWSER:  Retrieving accounts…
BROWSER:  Selection changed… B-002
>>> MADE SELECTION
BROWSER:  Accounts fully loaded!

In this specific case, the flaky behavior becomes apparent quickly. In other cases, the issue may not be visually apparent, or the data loads quick enough frequently enough that the issue goes unnoticed. However, think of Playwright as your fastest ⚡️ user (ever!), so it can help highlight races and bugs.

Limits of Auto-Waiting

Since Playwright is the fastest ⚡️ user (ever!), it automatically performs waiting and actionability checks under the hood. This keeps the API nice to use, and your tests fast. Instead of time-based waiting, it performs checks and waits on events to fire in the browser, so it can move on as soon as requirements are met without delaying arbitrarily.

For example, await page.goto(…) not only waits for an HTTP Response for the main page, but it also waits for the load event. When clicking, selecting, and otherwise interacting with elements on the page Playwright performs actionability checks on the elements to ensure it only does so at the first opportunity that makes sense (e.g. ensures the Locator and the element it resolves to are actually visibile, clickable, and enabled). Therefore, writing:

await page.locator("text=Choose an Account").selectOption("B-002");

should be enough (under nearly all) circumstances.

So, while it might appear something is not working with Playwright based on this information, we'll see in a bit (spoiler), I actually forgot to initially set the disabled attribute on the <select> while the page (after DOMContentLoaded) hydrated asynchronously with data, so both the App's users and Playwright accidentally could interact with the control before it was ready!

Why Not Use page.waitForTimeout(…)?

It looks like the data is always loaded if we wait 10 seconds before selecting the option, so it might be tempting to "fix" this test by doing:

import { test, expect } from "@playwright/test";
 
test("should allow the user to view account details", async ({ page }) => {
  await page.goto("https://pw.rwoll.dev/flaky-submissions/bad.html");
  await page.waitForTimeout(10_000); // 🚨 👎 BAD! Don't do this!!!!
  await page.locator("text=Choose an Account").selectOption("B-002");
  await expect(page.locator("id=info")).toContainText(
    "Coding Club was started in 1970."
  );
});

However, this creates more problems:

It covers up a bug in our app. Are we going to tell the user to wait for 10 seconds before interacting with the app?
It unnecessarily slows down every test. When our test passes, it can pass as quickly as <500ms…now if we force Playwright to wait for 10 seconds each time we've slowed all runs of this test down by up to 20X. This adds up over many tests slowing down the whole suite. Don't sprinkle in sleeps/waits!
It's likley to still flake in the future. Today, 10 seconds might be enough, but maybe when the App is under load, it would need 11 seconds or more. In CI, where there aren't many CPUs available, things might run slower, too.

Workaround

In the case that we can't change the app itself (maybe there's no release for a while), we can use a Web-First Assertions to help Playwright wait on another signal:

import { test, expect } from "@playwright/test";
 
test("should allow the user to view account details", async ({ page }) => {
  await page.goto("https://pw.rwoll.dev/flaky-submissions/bad.html");
  await expect(page.locator("id=status")).toContainText("Loaded!"); // ⚠️ This is a last resort to stabilize the test if you cannot change the app itself
  await page.locator("text=Choose an Account").selectOption("B-002");
  await expect(page.locator("id=info")).toContainText(
    "Coding Club was started in 1970."
  );
});

Instead of using a fixed timeout, Playwright will only wait long enough until the UI has the "Loaded!" message in it that the user can also see!

This stabilizes the test, but it does cover up a bug in the app itself since a user can still click without waiting. It's not ideal, but it's an option if the app under test cannot be changed. (Maybe you are using a third party UI component.)

If we run the above (with --repeat-each 20), we see it's now reliable (and quick)! Not only do all the tests pass, but most take <500ms while a few, where the data takes longer to load, take a few seconds—but none take 10 seconds!

In our demo app, there's a visual text indicator we can use to workaround the flake; however, depending on what you are trying to workaround or fix, you might instead wait for a different ready signal like Network Response coming from your API, or (under advanced circumstances) a Promise or other signal from within your app via await page.waitForFunction(…) or await expect.poll(…).

Actual Fix

In this case, if we fix our app itself, simply by putting a disabled attribute on the <select disabled> until the App is actually ready for the user to start selecting, none of the above workarounds are required, and Playwright's actionability checks take care of the work for us. Most importantly, we're fixing a bug in our app that a user could hit.

The diff on the Web App code:

bad.html vs. better.html

-     <select name="accounts" id="account-select">
+     <select name="accounts" id="account-select" disabled>

The deployed update:

If you reload it a few times, in the case that it takes a while to load data, you can observe the <select> is disabled letting you know it's not interactable.

Playwright won't click on it until it's enabled, and if a user happens to click the disabled element it will be ignored and without an error:

Our original, concise, and idiomatic test should now work (notice how there's no extra await expect(…)s—we just go to take our action):

import { test, expect } from "@playwright/test";
 
test("should allow the user to view account details", async ({ page }) => {
  // This is a different URL than we've been using. It contains the patched app.
  await page.goto("https://pw.rwoll.dev/flaky-submissions/better.html");
  await page.locator("text=Choose an Account").selectOption("B-002");
  await expect(page.locator("id=info")).toContainText(
    "Coding Club was started in 1970."
  );
});

If we run the test 100 times, we can see it passes every time and that much of the time it can finish quickly <500ms (the App is hydrated quickly), and sometimes it takes a few seconds (when the App takes longer to load data)—but we did not explicitly specify a time:

$ npx playwright test --repeat-each 100
 
Running 100 tests using 10 workers
[…truncated logs for brevity in post…]
  ✓  tests/example.spec.ts:3:1 › should allow the user to view account detai (939ms)
  …
  ✓  tests/example.spec.ts:3:1 › should allow the user to view account details (6s)
  ✓  tests/example.spec.ts:3:1 › should allow the user to view account detai (420ms)
 
100 passed (33s)

Awesome! 🎉

Additional Notes

Web Apps—between their complex frontend ends, backend, and CI systems—sometimes have flakes every now and then that are unavoidable or not worth debugging. In these cases, you can consider using Playwright's built in Retry settings to automatically retry tests.

However, this can cover up actual issues, so be sure to use this sparingly and/or monitor which tests are flaky. If one particular test is consistently flaky, it's better to just disable (via test.skip(…) or test.fixme(…)): it's either a bad test or something needs to be fixed in the app (like as was the case with our sample app).

If you have a large test suite (e.g. 100s or 1000s of tests), and a different seeminlgly random test flakes each CI run, the retries option can be helpful.

🏠 Home