playbook/outfitter-agents/plugins/outfitter/skills/tdd/references/quality-metrics.md

12 KiB
Raw Permalink Blame History

Test Quality Metrics

Comprehensive guide to measuring and improving test quality through coverage and mutation testing.

Coverage Metrics

Line Coverage

Percentage of code lines executed during test runs.

Target: ≥80% overall, ≥90% for critical paths

TypeScript/Bun:

bun test --coverage

# Output
Coverage Summary:
  Statements   : 85.2% ( 1420/1667 )
  Branches     : 78.5% ( 314/400 )
  Functions    : 82.1% ( 156/190 )
  Lines        : 85.2% ( 1420/1667 )

Rust:

# Using cargo-tarpaulin
cargo tarpaulin --out Html --output-dir coverage/

# Using cargo-llvm-cov
cargo llvm-cov --html

Branch Coverage

Percentage of decision branches (if/else, switch, ternary) executed.

Target: ≥75%

Example showing uncovered branch:

function divide(a: number, b: number): number {
  if (b === 0) {  // Branch covered
    throw new Error('Division by zero')
  }
  return a / b    // Branch covered
}

// Test only covers success path
test('divides numbers', () => {
  expect(divide(10, 2)).toBe(5)
})

// Coverage: 50% branches (only success branch covered)

Fix with both branches:

test('divides numbers', () => {
  expect(divide(10, 2)).toBe(5)
})

test('throws on division by zero', () => {
  expect(() => divide(10, 0)).toThrow('Division by zero')
})

// Coverage: 100% branches

Function Coverage

Percentage of functions called during tests.

Target: ≥80%

Uncovered functions often indicate:

  • Dead code that should be removed
  • Missing test cases
  • Helper functions only used in uncovered paths

Interpreting Coverage

High coverage ≠ high quality. Coverage shows what's tested, not how well.

Example of misleading coverage:

function processPayment(amount: number): Result {
  if (amount <= 0) {
    return { type: 'error', code: 'INVALID_AMOUNT' }
  }

  const result = chargeCard(amount)
  return { type: 'success', transactionId: result.id }
}

// Bad test with 100% coverage
test('processes payment', () => {
  processPayment(100)
  processPayment(-10)
})

// No assertions! 100% coverage but 0% verification

Coverage shows code was executed, not that it was verified correct.

Mutation Testing

Mutation testing verifies test quality by introducing small bugs and checking if tests catch them.

How It Works

  1. Mutant Generation: Tool mutates source code (e.g., ===!==, +-)
  2. Test Execution: Run tests against each mutant
  3. Classification:
    • Killed: Test fails (good — test caught the bug)
    • Survived: Test passes (bad — test missed the bug)
    • Timeout: Mutant caused infinite loop
    • No Coverage: Line not executed by tests

Mutation Score

Mutation Score = (Killed Mutants / Total Mutants) × 100%

Target: ≥75%

TypeScript Mutation Testing

Using Stryker:

Install:

bun add -d @stryker-mutator/core @stryker-mutator/typescript-checker

Configuration (stryker.conf.json):

{
  "mutator": "typescript",
  "packageManager": "bun",
  "reporters": ["html", "clear-text", "progress"],
  "testRunner": "bun",
  "coverageAnalysis": "perTest",
  "mutate": [
    "src/**/*.ts",
    "!src/**/*.test.ts",
    "!src/**/*.spec.ts"
  ],
  "thresholds": {
    "high": 80,
    "low": 60,
    "break": 50
  }
}

Run:

bun x stryker run

# Output
Mutation testing complete:
  Killed: 78
  Survived: 12
  Timeout: 2
  No Coverage: 8
  Mutation Score: 78.0%

Common Mutations:

Original Mutant Catches
=== !== Equality assertions
> >= Boundary tests
+ - Arithmetic verification
&& `
true false Boolean verification
0 1 Zero handling
return x return undefined Return value tests

Example Analysis:

function calculateDiscount(price: number, isPremium: boolean): number {
  if (isPremium) {
    return price * 0.8  // 20% discount
  }
  return price
}

// Weak test
test('calculates discount', () => {
  calculateDiscount(100, true)
  calculateDiscount(100, false)
})

// Mutation: 0.8 → 0.9
// Status: Survived (no assertion)

Fix with assertions:

test('applies 20% discount for premium users', () => {
  expect(calculateDiscount(100, true)).toBe(80)
})

test('no discount for regular users', () => {
  expect(calculateDiscount(100, false)).toBe(100)
})

// Mutation: 0.8 → 0.9
// Status: Killed (test fails with 90 !== 80)

Rust Mutation Testing

Using cargo-mutants:

Install:

cargo install cargo-mutants

Run:

cargo mutants

# Output
Mutation testing results:
  caught: 45
  missed: 5
  timeout: 1
  unviable: 2
  score: 90.0%

Common Mutations:

Original Mutant Catches
== != Equality tests
> >= Boundary tests
&& `
Some(x) None Option handling
Ok(x) Err(...) Result handling
+ - Arithmetic verification

Example:

fn calculate_discount(price: i32, is_premium: bool) -> i32 {
    if is_premium {
        price * 80 / 100  // 20% discount
    } else {
        price
    }
}

// Weak test
#[test]
fn test_discount() {
    calculate_discount(100, true);
    calculate_discount(100, false);
}

// Mutation: 80 → 90
// Status: missed (no assertion)

Fix:

#[test]
fn applies_discount_for_premium() {
    assert_eq!(calculate_discount(100, true), 80);
}

#[test]
fn no_discount_for_regular() {
    assert_eq!(calculate_discount(100, false), 100);
}

// Mutation: 80 → 90
// Status: caught (assertion fails)

Quality Standards Matrix

Metric Minimum Good Excellent
Line Coverage 70% 80% 90%
Branch Coverage 65% 75% 85%
Function Coverage 75% 85% 95%
Mutation Score 60% 75% 85%
Test Execution Time <10s <5s <2s

Improving Test Quality

Weak Assertion Detection

Problem: Tests execute code but don't verify results

// ❌ Weak - no verification
test('processes order', () => {
  processOrder({ items: [item1, item2] })
})

Solution:

// ✓ Strong - verifies result
test('processes order', () => {
  const result = processOrder({ items: [item1, item2] })
  expect(result.type).toBe('success')
  expect(result.total).toBe(150)
})

Missing Edge Cases

Use mutation testing to find gaps:

function validateAge(age: number): boolean {
  return age >= 18  // Mutant: >= → >
}

// Current test
test('validates age', () => {
  expect(validateAge(20)).toBe(true)
  expect(validateAge(16)).toBe(false)
})

// Mutation survived: >= → >
// Missing: boundary test for exactly 18

Add boundary test:

test('accepts exactly 18', () => {
  expect(validateAge(18)).toBe(true)
})

// Now mutation is caught

Test Redundancy

Multiple tests verifying same thing:

// Redundant tests
test('validates positive number', () => {
  expect(isPositive(5)).toBe(true)
})

test('validates another positive number', () => {
  expect(isPositive(10)).toBe(true)
})

test('validates yet another positive number', () => {
  expect(isPositive(100)).toBe(true)
})

Consolidate:

test.each([5, 10, 100])('validates positive number %i', (num) => {
  expect(isPositive(num)).toBe(true)
})

Continuous Quality Monitoring

CI/CD Integration

TypeScript:

# .github/workflows/test.yml
- name: Run tests with coverage
  run: bun test --coverage

- name: Check coverage thresholds
  run: |
    coverage=$(bun test --coverage --json | jq '.coverage.total.statements.pct')
    if (( $(echo "$coverage < 80" | bc -l) )); then
      echo "Coverage $coverage% below 80% threshold"
      exit 1
    fi    

- name: Run mutation testing
  run: bun x stryker run
  # Fail if mutation score < 75%

Rust:

# .github/workflows/test.yml
- name: Run tests with coverage
  run: cargo tarpaulin --fail-under 80

- name: Run mutation testing
  run: cargo mutants
  continue-on-error: true  # Warning only initially

Tracking Over Time

Monitor trends:

# Generate coverage badge
coverage=$(bun test --coverage --json | jq '.coverage.total.statements.pct')
echo "Coverage: $coverage%" > coverage.txt

# Track mutation score
mutation=$(bun x stryker run --json | jq '.mutationScore')
echo "Mutation Score: $mutation%" > mutation.txt

Advanced Techniques

Differential Coverage

Only measure coverage on changed code:

# Get changed files
git diff --name-only main... > changed.txt

# Run coverage on changed files
bun test --coverage --changed-files changed.txt

Coverage Ratcheting

Prevent coverage from decreasing:

# Save current coverage
current=$(bun test --coverage --json | jq '.coverage.total.statements.pct')
echo "$current" > .baseline-coverage

# On future runs, compare
baseline=$(cat .baseline-coverage)
if (( $(echo "$current < $baseline" | bc -l) )); then
  echo "Coverage decreased from $baseline% to $current%"
  exit 1
fi

Mutation Testing Optimization

Run only on changed code:

# Stryker incremental mode
bun x stryker run --incremental

# cargo-mutants on specific files
cargo mutants --file src/auth/mod.rs

Common Pitfalls

Pitfall 1: Chasing 100% Coverage

Problem: Diminishing returns past 90%, testing trivial code

// Trivial getter - not worth testing
class User {
  get email(): string {
    return this._email
  }
}

Solution: Focus on behavior, not line count. Exclude trivial code from coverage requirements.

Pitfall 2: Gaming Metrics

Problem: Tests that execute code without verification

// ❌ High coverage, zero value
test('calls all functions', () => {
  func1()
  func2()
  func3()
})

Solution: Use mutation testing to catch weak assertions.

Pitfall 3: Slow Mutation Testing

Problem: Full mutation testing takes hours

Solution: Run incrementally or in CI only:

# Local: Quick feedback on changed files
bun x stryker run --mutate "src/auth/**/*.ts"

# CI: Full suite
bun x stryker run

Quality Metrics Dashboard

Example report format:

Test Quality Report
===================

Coverage:
  Statements: 85.2% ░░░░░░░░▓▓
  Branches:   78.5% ░░░░░░░▓▓▓
  Functions:  82.1% ░░░░░░░░▓▓

Mutation Testing:
  Score:      78.0% ░░░░░░░▓▓▓
  Killed:     78
  Survived:   12
  No Cov:     8

Performance:
  Unit Tests: 2.3s  ✓
  Total:      8.7s  ✓

Status: ✓ All thresholds met

Actionable Improvement Plan

  1. Week 1: Establish baseline

    • Run coverage analysis
    • Run mutation testing
    • Document current state
  2. Week 2-3: Fix critical gaps

    • Add tests for uncovered critical paths
    • Fix survived mutants in high-risk code
    • Target 80% coverage, 75% mutation score
  3. Week 4: Automate

    • Add CI coverage checks
    • Set up coverage ratcheting
    • Schedule weekly mutation testing
  4. Ongoing: Maintain

    • Review coverage on each PR
    • Run mutation testing monthly
    • Gradually raise thresholds

Resources

TypeScript:

Rust: