Flaky Tests

A flaky test is a test that sometimes passes and sometimes fails without any changes to the code. Flaky tests are one of the biggest challenges in test automation, eroding developer confidence and slowing down delivery.

What Makes a Test Flaky?

Flaky tests typically fail due to non-deterministic behavior. Common causes include:

Race Conditions

Tests that don't properly wait for asynchronous operations:

// Flaky: Element might not be visible yet
await page.click('button');

// Better: Wait for the element to be ready
await page.click('button', { timeout: 5000 });

External Dependencies

Tests that rely on external services, APIs, or databases that may be slow or unavailable:

Network timeouts
Third-party API rate limits
Database connection issues
Slow CI infrastructure

Shared State

Tests that share state with other tests or don't properly clean up:

Database records left over from previous tests
Global variables modified by other tests
Browser storage not cleared between tests

Time-Dependent Logic

Tests that depend on the current time or date:

// Flaky: May fail near midnight
expect(getGreeting()).toBe('Good morning');

// Better: Mock the time
jest.useFakeTimers().setSystemTime(new Date('2024-01-01T09:00:00'));

Random Data

Tests that use random data without proper seeding:

// Flaky: Random data may cause edge cases
const user = generateRandomUser();

// Better: Use deterministic test data
const user = { name: 'Test User', email: 'test@example.com' };

How Spekra Detects Flaky Tests

Spekra identifies flaky tests through multiple methods:

Retry Detection

When a test fails and then passes on retry within the same run, it's marked as flaky. This is the most common detection method for Playwright tests.

Cross-Run Analysis

Spekra tracks test results across multiple runs. If a test passes in some runs and fails in others (with no code changes), it's flagged as potentially flaky.

Statistical Analysis

For tests with enough history, Spekra calculates a stability score based on the consistency of results. Tests with low stability are highlighted for investigation.

The Impact of Flaky Tests

Flaky tests are expensive

A single flaky test can waste hours of developer time investigating phantom failures and re-running CI pipelines.

Flaky tests cause several problems:

Lost developer time - Investigating failures that aren't real bugs
Reduced confidence - Developers start ignoring test failures
Slower delivery - Re-running pipelines to get a "green" build
Hidden bugs - Real failures get dismissed as flakiness

Best Practices

1. Fix Flaky Tests Immediately

Don't let flaky tests accumulate. When a test is marked as flaky:

Investigate the root cause
Fix the underlying issue
If it can't be fixed quickly, quarantine it

2. Use Explicit Waits

Never use arbitrary sleep() or wait() calls. Use explicit waits that check for conditions:

// Bad
await page.waitForTimeout(2000);

// Good
await page.waitForSelector('[data-testid="loaded"]');

3. Isolate Test Data

Each test should create and clean up its own data. Never rely on data from other tests.

4. Mock External Services

For unit and integration tests, mock external APIs and services to ensure deterministic behavior.

5. Monitor Stability Metrics

Use Spekra's stability metrics to track your test suite health over time and catch new flaky tests early.

Next Steps

Stability metrics - Understanding reliability, stability, and severity scores
Test identity - How Spekra tracks tests across changes
Flaky tests view - Using the Spekra dashboard to manage flaky tests