247 lines
11 KiB
Markdown
247 lines
11 KiB
Markdown
# Test Decay Risk Reference
|
|
|
|
Six patterns that cause test suites to degrade. Apply the Iron Law to each finding.
|
|
|
|
---
|
|
|
|
## Risk T1: Test Obscurity
|
|
|
|
**Diagnostic question:** How much effort does it take to understand what this test verifies?
|
|
|
|
Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite.
|
|
|
|
### Symptoms
|
|
|
|
- Assertion Roulette: multiple assertions with no message string — when one fails, it is
|
|
impossible to determine which behavior broke without reading every assertion
|
|
- Mystery Guest: test depends on external state (files, database rows, shared fixtures)
|
|
that is not visible in the test body
|
|
- Test names that do not express the scenario and expected outcome
|
|
(e.g., `test1`, `shouldWork`, `testLogin`, `testUserService`)
|
|
- General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making
|
|
each test's preconditions invisible
|
|
- Test body requires reading production code to understand what is being verified
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Assertion Roulette | Meszaros — xUnit Test Patterns | Assertion Roulette (p.224) |
|
|
| Mystery Guest | Meszaros — xUnit Test Patterns | Mystery Guest (p.411) |
|
|
| General Fixture | Meszaros — xUnit Test Patterns | General Fixture (p.316) |
|
|
| Test naming | Osherove — The Art of Unit Testing | method_scenario_expected naming convention |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages
|
|
- 🟡 Warning: multiple Mystery Guests; several ambiguous test names
|
|
- 🟢 Suggestion: minor naming issues; isolated General Fixture
|
|
|
|
### What Not to Flag
|
|
|
|
- Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story
|
|
- Shared setup is fine when every initialized value is relevant to nearly every test
|
|
- Concise test names are acceptable if scenario and expected outcome are still obvious
|
|
|
|
---
|
|
|
|
## Risk T2: Test Brittleness
|
|
|
|
**Diagnostic question:** Do tests break when you refactor without changing behavior?
|
|
|
|
Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite.
|
|
|
|
### Symptoms
|
|
|
|
- Tests assert on private method results, internal state, or implementation details
|
|
rather than observable behavior
|
|
- Eager Test: one test method verifies multiple unrelated behaviors; any single change
|
|
causes it to fail regardless of which behavior was touched
|
|
- Over-specified: assertions enforce mock call order or exact parameter values that are
|
|
irrelevant to the behavior being tested
|
|
- Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed
|
|
- Erratic Test: a test produces different results across runs without any change to
|
|
production code — caused by race conditions, time-dependent logic, random data, or
|
|
shared mutable state between tests
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Eager Test | Meszaros — xUnit Test Patterns | Eager Test (p.228) |
|
|
| Erratic Test | Meszaros — xUnit Test Patterns | Erratic Test |
|
|
| Implementation coupling | Osherove — The Art of Unit Testing | Test isolation principle |
|
|
| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail
|
|
- 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions
|
|
- 🟢 Suggestion: isolated over-specification in non-critical tests
|
|
|
|
### What Not to Flag
|
|
|
|
- Verifying an externally observable event or emitted command is not implementation coupling
|
|
- One test with several assertions is acceptable when all assertions support one behavior claim
|
|
- A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring
|
|
|
|
---
|
|
|
|
## Risk T3: Test Duplication
|
|
|
|
**Diagnostic question:** Is the same test scenario expressed in more than one place?
|
|
|
|
Duplicated tests must change in multiple places and create false confidence without testing distinct behavior.
|
|
|
|
### Symptoms
|
|
|
|
- Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests
|
|
without extraction into a shared helper
|
|
- Lazy Test: multiple tests verifying identical behavior with no differentiation in input,
|
|
state, or expected output
|
|
- Same boundary condition tested identically at unit, integration, and E2E level —
|
|
three copies with no layer differentiation
|
|
- Test helper functions or fixtures duplicated across test files instead of shared
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Test Code Duplication | Meszaros — xUnit Test Patterns | Test Code Duplication (p.213) |
|
|
| Lazy Test | Meszaros — xUnit Test Patterns | Lazy Test (p.232) |
|
|
| DRY violation in tests | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation
|
|
- 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction
|
|
- 🟢 Suggestion: minor helper duplication; isolated Lazy Tests
|
|
|
|
### What Not to Flag
|
|
|
|
- The same scenario may appear at unit and integration level when each layer verifies a distinct risk
|
|
- Small local setup duplication can be clearer than an over-abstracted fixture maze
|
|
- Similar assertions against different domain rules are not Lazy Tests if the business intent differs
|
|
|
|
---
|
|
|
|
## Risk T4: Mock Abuse
|
|
|
|
**Diagnostic question:** Is the test more complex than the behavior it tests?
|
|
|
|
Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up.
|
|
|
|
### Symptoms
|
|
|
|
- Mock setup code is longer than the test logic itself
|
|
- Primary assertion is `expect(mock).toHaveBeenCalledWith(...)` — the test verifies
|
|
that a mock was called, not that any real behavior occurred
|
|
- Test-only methods added to production classes for lifecycle management in tests
|
|
- Single unit test uses more than 3 mocks
|
|
- Incomplete Mock: mock object missing fields that downstream code will access,
|
|
causing silent failures only visible in integration
|
|
- Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Mock count > 3 | Osherove — The Art of Unit Testing | Mock usage guidelines |
|
|
| Testing mock behavior | Meszaros — xUnit Test Patterns | Behavior Verification (p.544) |
|
|
| Test-only production methods | Feathers — Working Effectively with Legacy Code | Ch. 3: Sensing and Separation |
|
|
| Hard-Coded Test Data | Meszaros — xUnit Test Patterns | Hard-Coded Test Data (p.534) |
|
|
| Incomplete Mock | Osherove — The Art of Unit Testing | Mock completeness requirement |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests
|
|
- 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications
|
|
- 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data
|
|
|
|
### What Not to Flag
|
|
|
|
- A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior
|
|
- Fakes and spies used to observe state transitions are not mock abuse by default
|
|
- One interaction assertion may be appropriate when the interaction itself is the behavior under test
|
|
|
|
---
|
|
|
|
## Risk T5: Coverage Illusion
|
|
|
|
**Diagnostic question:** Does the test suite actually protect against the failures that matter?
|
|
|
|
Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered."
|
|
|
|
### Symptoms
|
|
|
|
- High line coverage but error-handling branches, boundary conditions, and exception paths
|
|
have no corresponding tests
|
|
- Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases
|
|
- Legacy code areas are being actively modified with no tests present
|
|
(Feathers: "legacy code is code without tests")
|
|
- Coverage percentage treated as a sign-off criterion; critical change paths remain untested
|
|
- Tests assert on return values but not on important side effects such as database writes,
|
|
event publications, or state transitions
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Legacy code = no tests | Feathers — Working Effectively with Legacy Code | Ch. 1: "Legacy code is code without tests" |
|
|
| Change coverage vs line coverage | Google — How Google Tests Software | Ch. 11: Testing at Google Scale |
|
|
| Happy-path only | Osherove — The Art of Unit Testing | Test completeness principle |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent
|
|
- 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent
|
|
- 🟢 Suggestion: a few non-critical paths missing sad-path tests
|
|
|
|
### What Not to Flag
|
|
|
|
- High line coverage is useful when paired with branch, boundary, and change-path coverage
|
|
- A new module may have limited coverage early if it is still private and low-risk
|
|
- Side-effect assertions may live in integration tests rather than unit tests without implying a gap
|
|
|
|
---
|
|
|
|
## Risk T6: Architecture Mismatch
|
|
|
|
**Diagnostic question:** Does the test suite structure reflect the system's actual risk profile?
|
|
|
|
Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer.
|
|
|
|
### Symptoms
|
|
|
|
- Inverted test pyramid: E2E or integration test count exceeds unit test count,
|
|
causing a slow and fragile suite
|
|
- Legacy code with no seam points: no interfaces, dependency injection, or seams exist,
|
|
making it impossible to test in isolation without modifying production code
|
|
- Legacy areas being modified have no Characterization Tests to capture current behavior
|
|
before changes are made
|
|
- Full suite execution time exceeds 10 minutes (indicates architectural problem,
|
|
not a performance problem — too many slow tests)
|
|
- High-risk and low-risk paths are tested at identical density;
|
|
no risk-based prioritization in test distribution
|
|
|
|
### Sources
|
|
|
|
| Symptom | Book | Principle / Smell |
|
|
|---------|------|-------------------|
|
|
| Inverted pyramid | Google — How Google Tests Software | 70:20:10 unit:integration:E2E ratio |
|
|
| No seam points | Feathers — Working Effectively with Legacy Code | Ch. 4: Seam Model |
|
|
| Missing Characterization Tests | Feathers — Working Effectively with Legacy Code | Ch. 13: Characterization Tests |
|
|
| Suite execution time | Meszaros — xUnit Test Patterns | Slow Tests (p. 253) |
|
|
|
|
### Severity Guide
|
|
|
|
- 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted
|
|
- 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests
|
|
- 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests
|
|
|
|
### What Not to Flag
|
|
|
|
- Deviating from 70:20:10 can be justified by platform constraints or product risk
|
|
- A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered
|
|
- A small number of critical-path E2E tests is desirable, not a smell
|