Flaky Tests
A flaky test is a test that sometimes passes and sometimes fails without any changes to the code. Flaky tests are one of the biggest challenges in test automation, eroding developer confidence and slowing down delivery.
What Makes a Test Flaky?
Flaky tests typically fail due to non-deterministic behavior. Common causes include:
Race Conditions
Tests that don't properly wait for asynchronous operations:
// Flaky: Element might not be visible yet
await page.click('button');
// Better: Wait for the element to be ready
await page.click('button', { timeout: 5000 });
External Dependencies
Tests that rely on external services, APIs, or databases that may be slow or unavailable:
- Network timeouts
- Third-party API rate limits
- Database connection issues
- Slow CI infrastructure
Shared State
Tests that share state with other tests or don't properly clean up:
- Database records left over from previous tests
- Global variables modified by other tests
- Browser storage not cleared between tests
Time-Dependent Logic
Tests that depend on the current time or date:
// Flaky: May fail near midnight
expect(getGreeting()).toBe('Good morning');
// Better: Mock the time
jest.useFakeTimers().setSystemTime(new Date('2024-01-01T09:00:00'));
Random Data
Tests that use random data without proper seeding:
// Flaky: Random data may cause edge cases
const user = generateRandomUser();
// Better: Use deterministic test data
const user = { name: 'Test User', email: 'test@example.com' };
How Spekra Detects Flaky Tests
Spekra identifies flaky tests through multiple methods:
Retry Detection
When a test fails and then passes on retry within the same run, it's marked as flaky. This is the most common detection method for Playwright tests.
Cross-Run Analysis
Spekra tracks test results across multiple runs. If a test passes in some runs and fails in others (with no code changes), it's flagged as potentially flaky.
Statistical Analysis
For tests with enough history, Spekra calculates a stability score based on the consistency of results. Tests with low stability are highlighted for investigation.
The Impact of Flaky Tests
Flaky tests are expensive
A single flaky test can waste hours of developer time investigating phantom failures and re-running CI pipelines.
Flaky tests cause several problems:
- Lost developer time - Investigating failures that aren't real bugs
- Reduced confidence - Developers start ignoring test failures
- Slower delivery - Re-running pipelines to get a "green" build
- Hidden bugs - Real failures get dismissed as flakiness
Best Practices
1. Fix Flaky Tests Immediately
Don't let flaky tests accumulate. When a test is marked as flaky:
- Investigate the root cause
- Fix the underlying issue
- If it can't be fixed quickly, quarantine it
2. Use Explicit Waits
Never use arbitrary sleep() or wait() calls. Use explicit waits that check for conditions:
// Bad
await page.waitForTimeout(2000);
// Good
await page.waitForSelector('[data-testid="loaded"]');
3. Isolate Test Data
Each test should create and clean up its own data. Never rely on data from other tests.
4. Mock External Services
For unit and integration tests, mock external APIs and services to ensure deterministic behavior.
5. Monitor Stability Metrics
Use Spekra's stability metrics to track your test suite health over time and catch new flaky tests early.
Next Steps
- Stability metrics - Understanding reliability, stability, and severity scores
- Test identity - How Spekra tracks tests across changes
- Flaky tests view - Using the Spekra dashboard to manage flaky tests