Spekra
Docs

Getting Started

  • Overview
  • Playwright
  • Jest
  • Vitest

Core Concepts

  • Flaky Tests
  • Stability Metrics
  • Test Identity

Reporters

  • Playwright
  • Playwright Config
  • Jest
  • Jest Config
  • Vitest
  • Vitest Config

Platform

  • Dashboard
  • Flaky Tests View
  • Test Runs
  • API Keys
  • Rate Limits

CI/CD

  • Overview
  • GitHub Actions
  • GitLab CI

Security

  • Overview
  • Data Handling
  • Compliance

Troubleshooting

  • Overview
  • Connection Issues
  • Missing Data
DocsCore ConceptsStability Metrics

Stability Metrics

Spekra calculates several metrics to help you understand the health of your test suite. These metrics help prioritize which flaky tests to fix first and track improvement over time.

Core Metrics

Reliability

What it measures: How often a test produces the expected result (pass or fail consistently).

Formula: (Consistent Runs / Total Runs) × 100

A test is considered consistent if it passes all attempts in a run, or fails all attempts in a run. A run where a test fails then passes on retry is inconsistent.

ReliabilityInterpretation
95-100%Excellent - Very stable test
80-94%Good - Occasional flakiness
60-79%Poor - Frequently flaky
Below 60%Critical - Needs immediate attention

Stability

What it measures: How consistently a test passes across runs over time.

Formula: (Passing Runs / Total Runs) × 100

Unlike reliability, stability only looks at the final outcome. A test that fails then passes on retry is counted as passing.

A test can have high stability but low reliability if it frequently needs retries to pass.

Severity

What it measures: The impact of a flaky test based on frequency and recency.

Severity is a composite score that considers:

  • How often the test fails
  • How recently it has failed
  • Whether failures are increasing or decreasing
  • The impact on CI pipeline (blocking vs. non-blocking)
SeverityAction
CriticalFix immediately - blocking releases
HighFix this sprint - major disruption
MediumSchedule fix - noticeable impact
LowMonitor - minimal impact

Understanding the Metrics

Why Both Reliability and Stability?

Consider these two scenarios:

Scenario A: Low Reliability, High Stability

  • Test fails on first attempt 50% of the time
  • Always passes on retry
  • Reliability: 50%, Stability: 100%

This test is flaky but not blocking. It wastes CI time with retries but doesn't prevent deployments.

Scenario B: High Reliability, Low Stability

  • Test always produces consistent results within a run
  • Sometimes fails consistently across entire runs
  • Reliability: 95%, Stability: 70%

This test might have a real bug that only manifests under certain conditions (e.g., time-dependent, environment-dependent).

Time Windows

Metrics are calculated over different time windows:

  • 7 days - Recent performance (default view)
  • 30 days - Medium-term trends
  • 90 days - Long-term patterns

Use shorter windows to see recent changes; longer windows to identify persistent problems.

Using Metrics Effectively

Prioritizing Fixes

Use this priority order:

  1. High Severity + Low Reliability - Active problem, causing immediate pain
  2. High Severity + Low Stability - Likely a real bug masquerading as flakiness
  3. Low Severity + Low Reliability - Annoying but not blocking
  4. Low Severity + Low Stability - Monitor for patterns

Setting Team Goals

Reasonable targets for a healthy test suite:

MetricTarget
Suite Reliability> 95%
Suite Stability> 98%
Critical Severity Tests0
High Severity Tests< 5

Tracking Improvement

Use the Spekra dashboard to track metrics over time:

  • Monitor weekly/monthly trends
  • Celebrate improvements
  • Catch regressions early
  • Set alerts for threshold violations

Next Steps

  • Flaky tests - What causes flaky tests
  • Test identity - How tests are tracked across changes
  • Dashboard overview - Viewing metrics in the platform

Previous

Flaky Tests

Next

Test Identity