Stability Metrics
Spekra calculates several metrics to help you understand the health of your test suite. These metrics help prioritize which flaky tests to fix first and track improvement over time.
Core Metrics
Reliability
What it measures: How often a test produces the expected result (pass or fail consistently).
Formula: (Consistent Runs / Total Runs) × 100
A test is considered consistent if it passes all attempts in a run, or fails all attempts in a run. A run where a test fails then passes on retry is inconsistent.
| Reliability | Interpretation |
|---|---|
| 95-100% | Excellent - Very stable test |
| 80-94% | Good - Occasional flakiness |
| 60-79% | Poor - Frequently flaky |
| Below 60% | Critical - Needs immediate attention |
Stability
What it measures: How consistently a test passes across runs over time.
Formula: (Passing Runs / Total Runs) × 100
Unlike reliability, stability only looks at the final outcome. A test that fails then passes on retry is counted as passing.
A test can have high stability but low reliability if it frequently needs retries to pass.
Severity
What it measures: The impact of a flaky test based on frequency and recency.
Severity is a composite score that considers:
- How often the test fails
- How recently it has failed
- Whether failures are increasing or decreasing
- The impact on CI pipeline (blocking vs. non-blocking)
| Severity | Action |
|---|---|
| Critical | Fix immediately - blocking releases |
| High | Fix this sprint - major disruption |
| Medium | Schedule fix - noticeable impact |
| Low | Monitor - minimal impact |
Understanding the Metrics
Why Both Reliability and Stability?
Consider these two scenarios:
Scenario A: Low Reliability, High Stability
- Test fails on first attempt 50% of the time
- Always passes on retry
- Reliability: 50%, Stability: 100%
This test is flaky but not blocking. It wastes CI time with retries but doesn't prevent deployments.
Scenario B: High Reliability, Low Stability
- Test always produces consistent results within a run
- Sometimes fails consistently across entire runs
- Reliability: 95%, Stability: 70%
This test might have a real bug that only manifests under certain conditions (e.g., time-dependent, environment-dependent).
Time Windows
Metrics are calculated over different time windows:
- 7 days - Recent performance (default view)
- 30 days - Medium-term trends
- 90 days - Long-term patterns
Use shorter windows to see recent changes; longer windows to identify persistent problems.
Using Metrics Effectively
Prioritizing Fixes
Use this priority order:
- High Severity + Low Reliability - Active problem, causing immediate pain
- High Severity + Low Stability - Likely a real bug masquerading as flakiness
- Low Severity + Low Reliability - Annoying but not blocking
- Low Severity + Low Stability - Monitor for patterns
Setting Team Goals
Reasonable targets for a healthy test suite:
| Metric | Target |
|---|---|
| Suite Reliability | > 95% |
| Suite Stability | > 98% |
| Critical Severity Tests | 0 |
| High Severity Tests | < 5 |
Tracking Improvement
Use the Spekra dashboard to track metrics over time:
- Monitor weekly/monthly trends
- Celebrate improvements
- Catch regressions early
- Set alerts for threshold violations
Next Steps
- Flaky tests - What causes flaky tests
- Test identity - How tests are tracked across changes
- Dashboard overview - Viewing metrics in the platform