playbook/skills/thirdparty/_shared/test-decay-risks.md

11 KiB

Test Decay Risk Reference

Six patterns that cause test suites to degrade. Apply the Iron Law to each finding.


Risk T1: Test Obscurity

Diagnostic question: How much effort does it take to understand what this test verifies?

Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite.

Symptoms

  • Assertion Roulette: multiple assertions with no message string — when one fails, it is impossible to determine which behavior broke without reading every assertion
  • Mystery Guest: test depends on external state (files, database rows, shared fixtures) that is not visible in the test body
  • Test names that do not express the scenario and expected outcome (e.g., test1, shouldWork, testLogin, testUserService)
  • General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making each test's preconditions invisible
  • Test body requires reading production code to understand what is being verified

Sources

Symptom Book Principle / Smell
Assertion Roulette Meszaros — xUnit Test Patterns Assertion Roulette (p.224)
Mystery Guest Meszaros — xUnit Test Patterns Mystery Guest (p.411)
General Fixture Meszaros — xUnit Test Patterns General Fixture (p.316)
Test naming Osherove — The Art of Unit Testing method_scenario_expected naming convention

Severity Guide

  • 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages
  • 🟡 Warning: multiple Mystery Guests; several ambiguous test names
  • 🟢 Suggestion: minor naming issues; isolated General Fixture

What Not to Flag

  • Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story
  • Shared setup is fine when every initialized value is relevant to nearly every test
  • Concise test names are acceptable if scenario and expected outcome are still obvious

Risk T2: Test Brittleness

Diagnostic question: Do tests break when you refactor without changing behavior?

Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite.

Symptoms

  • Tests assert on private method results, internal state, or implementation details rather than observable behavior
  • Eager Test: one test method verifies multiple unrelated behaviors; any single change causes it to fail regardless of which behavior was touched
  • Over-specified: assertions enforce mock call order or exact parameter values that are irrelevant to the behavior being tested
  • Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed
  • Erratic Test: a test produces different results across runs without any change to production code — caused by race conditions, time-dependent logic, random data, or shared mutable state between tests

Sources

Symptom Book Principle / Smell
Eager Test Meszaros — xUnit Test Patterns Eager Test (p.228)
Erratic Test Meszaros — xUnit Test Patterns Erratic Test
Implementation coupling Osherove — The Art of Unit Testing Test isolation principle
Orthogonality violation Hunt & Thomas — The Pragmatic Programmer Ch. 2: Orthogonality

Severity Guide

  • 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail
  • 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions
  • 🟢 Suggestion: isolated over-specification in non-critical tests

What Not to Flag

  • Verifying an externally observable event or emitted command is not implementation coupling
  • One test with several assertions is acceptable when all assertions support one behavior claim
  • A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring

Risk T3: Test Duplication

Diagnostic question: Is the same test scenario expressed in more than one place?

Duplicated tests must change in multiple places and create false confidence without testing distinct behavior.

Symptoms

  • Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests without extraction into a shared helper
  • Lazy Test: multiple tests verifying identical behavior with no differentiation in input, state, or expected output
  • Same boundary condition tested identically at unit, integration, and E2E level — three copies with no layer differentiation
  • Test helper functions or fixtures duplicated across test files instead of shared

Sources

Symptom Book Principle / Smell
Test Code Duplication Meszaros — xUnit Test Patterns Test Code Duplication (p.213)
Lazy Test Meszaros — xUnit Test Patterns Lazy Test (p.232)
DRY violation in tests Hunt & Thomas — The Pragmatic Programmer DRY: Don't Repeat Yourself

Severity Guide

  • 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation
  • 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction
  • 🟢 Suggestion: minor helper duplication; isolated Lazy Tests

What Not to Flag

  • The same scenario may appear at unit and integration level when each layer verifies a distinct risk
  • Small local setup duplication can be clearer than an over-abstracted fixture maze
  • Similar assertions against different domain rules are not Lazy Tests if the business intent differs

Risk T4: Mock Abuse

Diagnostic question: Is the test more complex than the behavior it tests?

Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up.

Symptoms

  • Mock setup code is longer than the test logic itself
  • Primary assertion is expect(mock).toHaveBeenCalledWith(...) — the test verifies that a mock was called, not that any real behavior occurred
  • Test-only methods added to production classes for lifecycle management in tests
  • Single unit test uses more than 3 mocks
  • Incomplete Mock: mock object missing fields that downstream code will access, causing silent failures only visible in integration
  • Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints

Sources

Symptom Book Principle / Smell
Mock count > 3 Osherove — The Art of Unit Testing Mock usage guidelines
Testing mock behavior Meszaros — xUnit Test Patterns Behavior Verification (p.544)
Test-only production methods Feathers — Working Effectively with Legacy Code Ch. 3: Sensing and Separation
Hard-Coded Test Data Meszaros — xUnit Test Patterns Hard-Coded Test Data (p.534)
Incomplete Mock Osherove — The Art of Unit Testing Mock completeness requirement

Severity Guide

  • 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests
  • 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications
  • 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data

What Not to Flag

  • A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior
  • Fakes and spies used to observe state transitions are not mock abuse by default
  • One interaction assertion may be appropriate when the interaction itself is the behavior under test

Risk T5: Coverage Illusion

Diagnostic question: Does the test suite actually protect against the failures that matter?

Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered."

Symptoms

  • High line coverage but error-handling branches, boundary conditions, and exception paths have no corresponding tests
  • Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases
  • Legacy code areas are being actively modified with no tests present (Feathers: "legacy code is code without tests")
  • Coverage percentage treated as a sign-off criterion; critical change paths remain untested
  • Tests assert on return values but not on important side effects such as database writes, event publications, or state transitions

Sources

Symptom Book Principle / Smell
Legacy code = no tests Feathers — Working Effectively with Legacy Code Ch. 1: "Legacy code is code without tests"
Change coverage vs line coverage Google — How Google Tests Software Ch. 11: Testing at Google Scale
Happy-path only Osherove — The Art of Unit Testing Test completeness principle

Severity Guide

  • 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent
  • 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent
  • 🟢 Suggestion: a few non-critical paths missing sad-path tests

What Not to Flag

  • High line coverage is useful when paired with branch, boundary, and change-path coverage
  • A new module may have limited coverage early if it is still private and low-risk
  • Side-effect assertions may live in integration tests rather than unit tests without implying a gap

Risk T6: Architecture Mismatch

Diagnostic question: Does the test suite structure reflect the system's actual risk profile?

Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer.

Symptoms

  • Inverted test pyramid: E2E or integration test count exceeds unit test count, causing a slow and fragile suite
  • Legacy code with no seam points: no interfaces, dependency injection, or seams exist, making it impossible to test in isolation without modifying production code
  • Legacy areas being modified have no Characterization Tests to capture current behavior before changes are made
  • Full suite execution time exceeds 10 minutes (indicates architectural problem, not a performance problem — too many slow tests)
  • High-risk and low-risk paths are tested at identical density; no risk-based prioritization in test distribution

Sources

Symptom Book Principle / Smell
Inverted pyramid Google — How Google Tests Software 70:20:10 unit:integration:E2E ratio
No seam points Feathers — Working Effectively with Legacy Code Ch. 4: Seam Model
Missing Characterization Tests Feathers — Working Effectively with Legacy Code Ch. 13: Characterization Tests
Suite execution time Meszaros — xUnit Test Patterns Slow Tests (p. 253)

Severity Guide

  • 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted
  • 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests
  • 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests

What Not to Flag

  • Deviating from 70:20:10 can be justified by platform constraints or product risk
  • A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered
  • A small number of critical-path E2E tests is desirable, not a smell