# Test Quality Review Guide — Mode 4

**Purpose:** Diagnose the health of a test suite using six test-space decay risks.
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.

---

## Before You Start: Build the Test Suite Map

Before scanning for any risk, map the current test suite structure:

```
Unit tests:        X files, ~N tests
Integration tests: X files, ~N tests
E2E tests:         X files, ~N tests
Ratio:             Unit X%  :  Integration X%  :  E2E X%
Coverage areas:    [modules with tests] vs [modules without tests]
```

If you cannot access test files directly, ask the user **one question** — choose the
most relevant:
1. "Which module is hardest to test or has the least coverage?"
2. "When you make a change, how often do unrelated tests break?"
3. "Is there a part of the codebase your team avoids touching because it has no tests?"

After one answer, proceed. Do not ask more than one question.

---

## Analysis Process

Work through these five steps in order.

### Step 1: Scan for Test Obscurity

*Scan this first — the most visible risk and the one that determines whether the suite
is maintainable at all.*

Look for:
- Read 5–10 test names at random: can each one communicate subject + scenario + expected
  outcome without opening the test body?
- Are there tests where a failure gives no clue which behavior broke (multiple assertions,
  no message strings)?
- Does any test depend on external state (files, database rows, env variables, shared mutable
  fixtures) that is invisible from within the test body?
- Is there a single massive setUp or beforeEach that every test inherits regardless of
  what it actually needs?

If all test names are clear and setups are minimal → no finding.

### Step 2a: Scan for Test Brittleness

*Brittle tests break on refactors that do not change observable behavior — they test
implementation, not contracts.*

Look for:
- Ask (or check git history): did any recent refactor cause test failures with no
  behavior change?
- Are there test methods where the name contains "and" or that assert on 3 or more
  unrelated behaviors (Eager Test)?
- Do assertions specify mock call order or exact parameter values that are irrelevant
  to the observable behavior?
- Are tests coupled to private methods or internal state directly?

If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
If isolated (1–2 brittle tests) → 🟢 Suggestion.

### Step 2b: Scan for Mock Abuse

*Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
Scan this separately from brittleness — over-mocking is often the cause of brittleness,
but it is a distinct problem worth its own finding.*

**Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and
check brittleness signals and mock-setup ratio in the same pass, then write separate
findings if both problems are present.

Look for:
- Is mock setup code longer than the assertion logic in the sampled tests?
- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
  assertions on outputs, state, or observable events?
- Are there methods in production classes that are only called from test files
  (test-induced design damage)?
- Does any single test create more than 3 mock objects?

If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
If production methods exist only for test access → 🔴 Critical (architecture is being
distorted by the test suite).

### Step 3: Scan for Test Duplication

Look for:
- Is the same setup block (same variables initialized the same way) repeated across
  5 or more test files without a shared helper?
- Are there multiple tests that pass identical inputs and assert identical outputs
  with no differentiation (Lazy Test)?
- Is the same business scenario covered at unit, integration, and E2E level with no
  difference in what each layer is testing?

If duplication is systemic (10 or more instances) → Critical.
If localized (3–5 instances) → Warning.

### Step 4: Scan for Coverage Illusion and Architecture Mismatch

Look for Coverage Illusion:
- Pick the most recently modified core module. Are its error-handling branches and
  null/boundary inputs covered by tests?
- Are there legacy areas (old functions, no test files nearby) that are actively
  being changed?
- Do the tests assert on side effects (DB writes, events emitted, state transitions)
  or only on return values?

**Characterization Test check:** If legacy code is being modified without existing tests,
the team needs Characterization Tests before making the change — not after. Look for
this pattern and flag it when absent.

A Characterization Test locks in current behavior (right or wrong) so future changes
do not silently regress it. Template:
```
test("characterize: [module].[method] given [input], returns [current output]") {
  // Call the code under test with realistic inputs
  // Assert on whatever it currently returns — even if you suspect the output is wrong
  // Add a comment: "This captures current behavior, not necessarily correct behavior"
}
```
Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests

Look for Architecture Mismatch:
- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
- Are high-risk modules tested at higher density than trivial utilities?

**Test suite performance:** A slow test suite is a first-class maintainability risk — it
breaks the fast-feedback loop and causes developers to skip running tests locally.
- If the full suite runtime is known and > 10 minutes → 🟡 Warning
- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
  means nobody is running it regularly)
- If tests that could be unit tests are integration tests, that is a Performance Mismatch:
  each misclassified test adds seconds of avoidable wait time

Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)

### Step 5: Apply Iron Law, Output Report

Apply the Iron Law format from `../_shared/common.md` to each finding.

Use the standard Report Template. Mode: Test Quality Review.
Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".