Stability Metrics

Spekra calculates several metrics to help you understand the health of your test suite. These metrics help prioritize which flaky tests to fix first and track improvement over time.

Core Metrics

Reliability

What it measures: How often a test produces the expected result (pass or fail consistently).

Formula: (Consistent Runs / Total Runs) × 100

A test is considered consistent if it passes all attempts in a run, or fails all attempts in a run. A run where a test fails then passes on retry is inconsistent.

Reliability	Interpretation
95-100%	Excellent - Very stable test
80-94%	Good - Occasional flakiness
60-79%	Poor - Frequently flaky
Below 60%	Critical - Needs immediate attention

Stability

What it measures: How consistently a test passes across runs over time.

Formula: (Passing Runs / Total Runs) × 100

Unlike reliability, stability only looks at the final outcome. A test that fails then passes on retry is counted as passing.

A test can have high stability but low reliability if it frequently needs retries to pass.

Severity

What it measures: The impact of a flaky test based on frequency and recency.

Severity is a composite score that considers:

How often the test fails
How recently it has failed
Whether failures are increasing or decreasing
The impact on CI pipeline (blocking vs. non-blocking)

Severity	Action
Critical	Fix immediately - blocking releases
High	Fix this sprint - major disruption
Medium	Schedule fix - noticeable impact
Low	Monitor - minimal impact

Understanding the Metrics

Why Both Reliability and Stability?

Consider these two scenarios:

Scenario A: Low Reliability, High Stability

Test fails on first attempt 50% of the time
Always passes on retry
Reliability: 50%, Stability: 100%

This test is flaky but not blocking. It wastes CI time with retries but doesn't prevent deployments.

Scenario B: High Reliability, Low Stability

Test always produces consistent results within a run
Sometimes fails consistently across entire runs
Reliability: 95%, Stability: 70%

This test might have a real bug that only manifests under certain conditions (e.g., time-dependent, environment-dependent).

Time Windows

Metrics are calculated over different time windows:

7 days - Recent performance (default view)
30 days - Medium-term trends
90 days - Long-term patterns

Use shorter windows to see recent changes; longer windows to identify persistent problems.

Using Metrics Effectively

Prioritizing Fixes

Use this priority order:

High Severity + Low Reliability - Active problem, causing immediate pain
High Severity + Low Stability - Likely a real bug masquerading as flakiness
Low Severity + Low Reliability - Annoying but not blocking
Low Severity + Low Stability - Monitor for patterns

Setting Team Goals

Reasonable targets for a healthy test suite:

Metric	Target
Suite Reliability	> 95%
Suite Stability	> 98%
Critical Severity Tests	0
High Severity Tests	< 5

Tracking Improvement

Use the Spekra dashboard to track metrics over time:

Monitor weekly/monthly trends
Celebrate improvements
Catch regressions early
Set alerts for threshold violations

Next Steps

Flaky tests - What causes flaky tests
Test identity - How tests are tracked across changes
Dashboard overview - Viewing metrics in the platform