📦 deps(skills): sync thirdparty skills

2026-05-29 08:33:55 +00:00 · 2026-05-29 08:33:55 +00:00 · b4c88b32be
parent b7eb57f5da
commit b4c88b32be
25 changed files with 3033 additions and 0 deletions
--- a/skills/thirdparty/.sources/brooks-lint.list
+++ b/skills/thirdparty/.sources/brooks-lint.list
@ -0,0 +1,7 @@
+_shared
+brooks-audit
+brooks-debt
+brooks-health
+brooks-review
+brooks-sweep
+brooks-test
--- a/skills/thirdparty/.sources/codebase-migrate.list
+++ b/skills/thirdparty/.sources/codebase-migrate.list
@ -0,0 +1 @@
+codebase-migrate
--- a/skills/thirdparty/.sources/codebase-recon.list
+++ b/skills/thirdparty/.sources/codebase-recon.list
@ -0,0 +1 @@
+codebase-recon
--- a/skills/thirdparty/_shared/common.md
+++ b/skills/thirdparty/_shared/common.md
@ -0,0 +1,240 @@
+# Brooks-Lint — Shared Framework
+
+Code and test quality diagnosis using principles from twelve classic software engineering books.
+Use `source-coverage.md` to keep those sources grounded in real evidence, exceptions, and tradeoffs.
+
+## The Iron Law
+
+```
+NEVER suggest fixes before completing risk diagnosis.
+EVERY finding must follow: Symptom → Source → Consequence → Remedy.
+```
+
+Violating this law produces reviews that list rule violations without explaining why they
+matter. A finding without a consequence and a remedy is not a finding — it is noise.
+
+> **On-demand sections (skip unless the condition applies):**
+> - "Remedy Mode" — only when user passes `--fix` or asks to fix findings
+> - "Post-Report Triage" — only in interactive sessions after the report is output
+> - "History Tracking" — only after the Health Score is computed
+
+## Project Config
+
+Before executing the review, attempt to read `.brooks-lint.yaml` from the project root.
+If the file exists, parse and apply its settings before proceeding.
+If the file does not exist, continue with defaults (all risks enabled, no ignores).
+
+In a multi-mode session, re-read only if the user says the config has changed.
+
+### Supported settings
+
+**`disable`** — list of risk codes to skip entirely. Findings for disabled risks are
+silently omitted from the report and do not affect the Health Score.
+Valid codes: `R1` `R2` `R3` `R4` `R5` `R6` `T1` `T2` `T3` `T4` `T5` `T6`
+
+**`severity`** — override the severity of a specific risk for this project.
+Valid values: `critical` `warning` `suggestion`
+Example: `R1: suggestion` means every R1 finding is downgraded to Suggestion regardless
+of what the guide says.
+
+**`ignore`** — list of glob patterns. Files matching any pattern are excluded from
+analysis. Findings that arise solely from ignored files are omitted.
+Common entries: `**/*.generated.*`, `**/vendor/**`, `**/migrations/**`
+
+**`focus`** — non-empty list of risk codes to evaluate; all others are skipped.
+Omit this key (or leave it empty) to evaluate all non-disabled risks.
+Cannot be combined with a non-empty `disable` list.
+
+**Minimal example:**
+```yaml
+version: 1
+disable:
+  - T5
+severity:
+  R1: suggestion
+ignore:
+  - "**/*.generated.*"
+```
+
+If `.brooks-lint.yaml` contains a `custom_risks` map, read `custom-risks-guide.md`
+from the `_shared/` directory for loading and scanning instructions.
+
+### Config Validation
+
+Before applying, check for errors and mention each in the report:
+- Invalid risk code (not R1–R6, T1–T6, or a defined `Cx` code): skip it, note `"Config warning: X is not a valid risk code"`
+- Invalid severity value (not `critical`/`warning`/`suggestion`): skip it, note the error
+- Both `disable` and `focus` are non-empty: treat as a config error, ignore both, note it
+
+If the YAML fails to parse entirely, skip config loading and proceed with defaults.
+
+### Config Reporting
+
+If a config file was found and applied, add this line immediately after the **Scope** line
+in the report:
+`Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)`
+
+Include N and M even if zero. Omit this line if no config file was found.
+
+---
+
+## Auto Scope Detection
+
+When no files or code are specified, detect scope automatically:
+
+**PR Review:** `git diff --cached` → `git diff` → `git diff main...HEAD` → ask user.
+
+**Architecture Audit / Tech Debt:** Entire project by default. `--since=<ref>`: run `git diff <ref>...HEAD --name-only`, analyze only modules containing changed files; note "Incremental audit — modules touched since <ref>".
+
+**Test Quality:** All test files by default. If a diff exists, prioritize test files co-located with changed production files (`src/foo.ts` → `src/foo.test.ts`).
+
+**Health Dashboard:** Entire project by default. If user provides a path, scope all dimension sub-scans to that path.
+
+**Scope line:** Always state what was detected — e.g., `Scope: staged changes (3 files)` or `Scope: branch changes vs main (12 files)`.
+
+---
+
+## The Six Decay Risks
+
+Navigation index only — canonical definitions (symptoms, severity guides, sources, "What Not
+to Flag" guards) live in `decay-risks.md`. Do not duplicate or edit diagnostic questions here;
+update `decay-risks.md` directly. Book-level coverage, exceptions, and tradeoffs are in
+`source-coverage.md`.
+
+| Risk | Diagnostic Question |
+|------|---------------------|
+| Cognitive Overload | How much mental effort to understand this? |
+| Change Propagation | How many unrelated things break on one change? |
+| Knowledge Duplication | Is the same decision expressed in multiple places? |
+| Accidental Complexity | Is the code more complex than the problem? |
+| Dependency Disorder | Do dependencies flow in a consistent direction? |
+| Domain Model Distortion | Does the code faithfully represent the domain? |
+
+---
+
+## Report Template
+
+**Language rule:** Output the report in the same language the user is using. Translate the
+per-finding content and the one-sentence verdict to match the user's language. Keep the
+following in English: Iron Law field labels (Symptom / Source / Consequence / Remedy),
+book titles, principle and smell names (e.g. "Shotgun Surgery", "Divergent Change"),
+and fixed structural headers from the template below (`Findings`, `Summary`,
+`Module Dependency Graph`, `Critical`, `Warning`, `Suggestion`).
+
+````
+# Brooks-Lint Review
+
+**Mode:** [PR Review / Architecture Audit / Tech Debt Assessment / Test Quality Review]
+**Scope:** [file(s), directory, or description of what was reviewed]
+**Health Score:** XX/100
+
+[One sentence overall verdict]
+
+---
+
+## Module Dependency Graph
+
+<!-- Mode 2 (Architecture Audit) ONLY — omit this section for other modes -->
+<!-- classDef colors: see architecture-guide.md Step 1 Rule 6 -->
+
+```mermaid
+graph TD
+    ...
+```
+
+---
+
+## Findings
+
+<!-- Sort all findings by severity: Critical first, then Warning, then Suggestion -->
+<!-- If no findings in a severity tier, omit that tier's heading -->
+
+### 🔴 Critical
+
+**[Risk Name] — [Short descriptive title]**
+Symptom: [exactly what was observed in the code]
+Source: [Book title — Principle or Smell name]
+Consequence: [what breaks or gets worse if this is not fixed]
+Remedy: [concrete, specific action]
+
+### 🟡 Warning
+
+**[Risk Name] — [Short descriptive title]**
+Symptom: ...
+Source: ...
+Consequence: ...
+Remedy: ...
+
+### 🟢 Suggestion
+
+**[Risk Name] — [Short descriptive title]**
+Symptom: ...
+Source: ...
+Consequence: ...
+Remedy: ...
+
+---
+
+## Summary
+
+[2–3 sentences: what is the most important action, and what is the overall trend]
+````
+
+## Remedy Mode
+
+When the user passes `--fix` or asks to "fix the findings", read
+`remedy-guide.md` from the `_shared/` directory before writing the report.
+
+## Health Score Calculation
+
+Base score: 100
+Deductions:
+- Each 🔴 Critical finding: −15
+- Each 🟡 Warning finding: −5
+- Each 🟢 Suggestion finding: −1
+Floor: 0 (score cannot go below 0)
+
+## History Tracking
+
+After generating the Health Score, attempt to append a record to `.brooks-lint-history.json`
+in the project root.
+
+**Append logic:**
+1. Read the file (or start with empty array if it doesn't exist)
+2. Append: `{ date, mode, score, findings: { critical, warning, suggestion }, scope }`
+3. Write the file back
+
+**Trend display:** If the history file exists and contains at least one prior record for
+the same mode, add a Trend line after the Health Score in the report:
+
+  **Trend:** 85 → 82 (−3) over last 3 runs
+
+Show the most recent prior score and the delta. If delta is 0: "Stable at 82".
+If this is the first run for this mode: "First run — no trend data".
+
+## Post-Report Triage (Optional)
+
+**Guard:** Interactive sessions only — skip in CI/headless mode.
+
+After reporting Warning or Suggestion findings, offer:
+> Would you like to triage these findings? (accept / dismiss / defer / skip)
+
+For each finding one at a time (lowest severity first): show title, ask `[a]ccept / [d]ismiss / [f]defer / [s]kip`; wait for reply before moving to the next.
+
+**Dismiss:** ask one-line reason → append to `.brooks-lint.yaml` under `suppress:` → downgraded to info in future runs.
+
+**Defer:** same as dismiss, add `expires: YYYY-MM-DD` (default 90 days) → resurfaces at original severity after expiry.
+
+**Suppress matching at scan time:** for each `suppress:` entry, match `risk` code and file `pattern` against findings.
+- Both match → downgrade to info (not counted in Health Score, shown under collapsed "Suppressed" section).
+- `expires` is past → ignore entry, finding resurfaces. Note in Summary: "N suppressed findings have expired and are now active again."
+
+## Reference Files
+
+Read on demand:
+
+| File | When to Read |
+|------|-------------|
+| `source-coverage.md` | At the start of every review, before writing findings |
+| `decay-risks.md` | Before any production-code review or architecture/debt assessment |
+| `test-decay-risks.md` | Before any test review and before the PR Review "Quick Test Check" step |
--- a/skills/thirdparty/_shared/custom-risks-guide.md
+++ b/skills/thirdparty/_shared/custom-risks-guide.md
@ -0,0 +1,48 @@
+# Custom Risk Loading Guide
+
+When `.brooks-lint.yaml` contains a `custom_risks` map, this guide governs how those
+risks are loaded and scanned. Custom risks use `Cx` codes (C1, C2, …) — no conflict with
+the standard R1–R6 and T1–T6 namespaces.
+
+---
+
+## Loading
+
+1. For each entry in `custom_risks`, validate that it has:
+   - `name` — non-empty string
+   - `question` — the diagnostic question to ask
+   - `symptoms` — non-empty list of symptom patterns
+   - `severity` — map with at least one of: `critical`, `warning`, `suggestion`
+
+2. Register each valid entry as a `Cx` code alongside R1–R6 / T1–T6. Once loaded,
+   `Cx` codes become valid targets for `disable`, `focus`, and `severity` fields in
+   the same config file.
+
+3. Report any validation errors as config warnings (do not abort the review):
+   - Missing required field: `"Config warning: C1 missing 'symptoms'"`
+   - Invalid code format (must be `C` followed by digits): skip, note error
+   - Code conflicts with R/T namespace: skip, note error
+
+---
+
+## Scanning
+
+During the analysis, treat each custom risk as an additional step after the standard
+process:
+
+- Use `question` as the diagnostic question
+- Use `symptoms` as the symptom lookup list
+- Use the `severity` map for tier classification
+- Apply the Iron Law: `Source` field should be `"[Project-defined risk] — <risk name>"`
+- Include custom risk findings in the Health Score (same deduction rules as R/T codes)
+- In the report, custom findings appear after standard findings under a
+  **### Project-Specific Risks** sub-heading
+
+---
+
+## Config Validation additions
+
+The following codes are valid in `disable`, `focus`, and `severity`:
+- Standard: `R1`–`R6`, `T1`–`T6`
+- Custom: any `Cx` code defined in `custom_risks`
+- Any other code: skip it and emit `"Config warning: X is not a valid risk code"`
--- a/skills/thirdparty/_shared/decay-risks.md
+++ b/skills/thirdparty/_shared/decay-risks.md
@ -0,0 +1,294 @@
+# Decay Risk Reference
+
+Six patterns that cause software to degrade. Apply the Iron Law to each finding.
+
+---
+
+## Risk 1: Cognitive Overload
+
+**Diagnostic question:** How much mental effort does a human need to understand this?
+
+Cognitive load beyond working memory causes mistakes, avoidance, and blocks the refactoring that would fix it.
+
+### Symptoms
+
+- Function longer than 20 lines where multiple levels of abstraction are mixed together
+- Nesting depth greater than 3 levels
+- Parameter list with more than 4 parameters
+- Magic numbers or unexplained constants
+- Variable names that require reading the implementation to understand (e.g., `d`, `tmp2`, `flag`)
+- Boolean expressions with 3 or more conditions combined
+- Train-wreck chains: `a.getB().getC().doD()`
+- Code names that do not match what the business calls the same concept
+- Flag Arguments: a boolean parameter that makes the function do two fundamentally different
+  things depending on its value — a sign the function has two responsibilities
+- Primitive Obsession: domain concepts represented as primitive types (`String email`,
+  `int orderId`, `double money`) rather than purpose-built value types — forces callers to know
+  which string is an email and which is a name
+- Shallow module: the interface or documentation of a component is more complex relative to
+  the functionality it provides
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Long Method | Fowler — Refactoring | Long Method |
+| Long Parameter List | Fowler — Refactoring | Long Parameter List |
+| Message Chains | Fowler — Refactoring | Message Chains |
+| Flag Arguments | Fowler — Refactoring | Flag Arguments |
+| Primitive Obsession | Fowler — Refactoring | Primitive Obsession |
+| Function length and nesting | McConnell — Code Complete | Ch. 7: High-Quality Routines |
+| Variable naming | McConnell — Code Complete | Ch. 11: The Power of Variable Names |
+| Magic numbers | McConnell — Code Complete | Ch. 12: Fundamental Data Types |
+| Domain name mismatch | Evans — Domain-Driven Design | Ubiquitous Language |
+| Shallow Module | Ousterhout — A Philosophy of Software Design | Ch. 4: Modules Should Be Deep |
+
+### Severity Guide
+
+- 🔴 Critical: function > 50 lines, nesting > 5, or virtually no meaningful names
+- 🟡 Warning: function 20–50 lines, nesting 4–5, some unclear names
+- 🟢 Suggestion: minor naming issues, 1–2 magic numbers, isolated train-wreck chains
+
+### What Not to Flag
+
+- Linear code with clear names and guard clauses is not automatically high cognitive load
+- Internal implementation detail hidden behind a deep, simple module boundary is not a shallow-module problem
+- Domain-specific terminology should not be flagged if it matches how experts actually speak
+
+---
+
+## Risk 2: Change Propagation
+
+**Diagnostic question:** How many unrelated things break when you change one thing?
+
+Each change ripples to unrelated modules, slowing velocity and multiplying regression risk.
+
+### Symptoms
+
+- Modifying one feature requires touching more than 3 files in unrelated modules
+- One class changes for multiple different business reasons (e.g., `UserService` changes for
+  billing logic AND notification logic AND profile logic)
+- A method uses more data from another class than from its own class
+- Two classes know each other's internal state directly
+- Changing one module requires recompiling or retesting many unrelated modules
+- **Hyrum's Law**: with sufficient callers, every observable behavior — including
+  implementation details, error message text, coincidental call ordering, and undocumented
+  side effects — becomes an implicit contract that callers depend on, even though it was
+  never guaranteed by the declared API
+- **Orthogonality violation**: changing one dimension of a feature forces edits in
+  unrelated dimensions — adding a new payment type should not require touching logging,
+  caching, or notification code, but in a non-orthogonal design it does
+- Information Leakage: a design decision (e.g., a file format, protocol detail, or data
+  shape) is encoded in more than one module, so changing it requires coordinated edits
+  in multiple places even though only one module "owns" the concept
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Shotgun Surgery | Fowler — Refactoring | Shotgun Surgery |
+| Divergent Change | Fowler — Refactoring | Divergent Change |
+| Feature Envy | Fowler — Refactoring | Feature Envy |
+| Inappropriate Intimacy | Fowler — Refactoring | Inappropriate Intimacy |
+| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
+| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle |
+| High change propagation radius | Brooks — The Mythical Man-Month | Ch. 2: Brooks's Law (communication overhead) |
+| Hyrum's Law | Winters et al. — Software Engineering at Google | Ch. 1: Hyrum's Law |
+| Information Leakage | Ousterhout — A Philosophy of Software Design | Ch. 5: Information Hiding and Leakage |
+
+### Severity Guide
+
+- 🔴 Critical: one change touches > 5 files, or there is a structural dependency inversion (domain depends on infrastructure)
+- 🟡 Warning: one change touches 3–5 files, mild coupling between modules
+- 🟢 Suggestion: minor coupling, easily isolatable
+
+### What Not to Flag
+
+- A composition root wiring concrete dependencies is not a DIP violation by itself
+- A stable public API with intentionally supported behavior is not automatically Hyrum's Law debt
+- Similar edits inside one bounded context may be normal coordinated change, not shotgun surgery
+
+---
+
+## Risk 3: Knowledge Duplication
+
+**Diagnostic question:** Is the same decision expressed in more than one place?
+
+Multiple copies drift apart silently. DRY is about decisions, not code lines.
+
+### Symptoms
+
+- Same logic copy-pasted across multiple files or functions
+- Same concept named differently in different parts of the codebase
+  (e.g., `user`, `account`, `member`, `customer` all referring to the same domain entity)
+- Parallel class hierarchies that must change in sync
+  (e.g., adding a new payment type requires adding a class in 3 different hierarchies)
+- Configuration values repeated as literals in multiple places
+- Two modules that implement the same algorithm independently
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Code duplication | Fowler — Refactoring | Duplicate Code |
+| Parallel Inheritance | Fowler — Refactoring | Parallel Inheritance Hierarchies |
+| DRY violation | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
+| Inconsistent naming | Evans — Domain-Driven Design | Ubiquitous Language |
+| Alternative Classes | Fowler — Refactoring | Alternative Classes with Different Interfaces |
+
+### Severity Guide
+
+- 🔴 Critical: core business logic duplicated across modules, or same domain concept named 3+ different ways
+- 🟡 Warning: utility code duplicated, naming inconsistent within a subsystem
+- 🟢 Suggestion: minor literal duplication, single naming inconsistency
+
+### What Not to Flag
+
+- Repetition across separate bounded contexts is not automatically duplicate knowledge
+- Temporary duplication during an active extraction or migration is not necessarily debt
+- Shared protocol constants repeated at explicit boundaries may be acceptable when local ownership is clearer
+
+---
+
+## Risk 4: Accidental Complexity
+
+**Diagnostic question:** Is the code more complex than the problem it solves?
+
+Accidental complexity accumulates addition by addition until developers fight scaffolding more than solving the problem.
+
+### Symptoms
+
+- Abstractions built "for future use" with no current consumer
+  (e.g., a plugin system for a use case that has only one known implementation)
+- Classes that barely justify their existence (wrap a single method call)
+- Classes that only delegate to another class without adding behavior (pure middle-men)
+- Second attempt at a system that is significantly more elaborate than the first,
+  adding generality for requirements that do not yet exist
+- Switch statements that signal missing polymorphism
+- Configuration options that have never been changed from their defaults
+- Framework code larger than the application it powers
+- Code grown under sustained tactical shortcuts: each workaround seemed small, but
+  accumulated shortcuts mean every new feature requires fighting the existing structure
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Speculative Generality | Fowler — Refactoring | Speculative Generality |
+| Lazy Class | Fowler — Refactoring | Lazy Class |
+| Middle Man | Fowler — Refactoring | Middle Man |
+| Switch Statements | Fowler — Refactoring | Switch Statements |
+| Second System Effect | Brooks — The Mythical Man-Month | Ch. 5: The Second-System Effect |
+| YAGNI violations | McConnell — Code Complete | Ch. 5: Design in Construction |
+| Over-engineering | Hunt & Thomas — The Pragmatic Programmer | Topic 4: Good-Enough Software |
+| Tactical programming debt | Ousterhout — A Philosophy of Software Design | Ch. 3: Strategic vs. Tactical Programming |
+
+### Severity Guide
+
+- 🔴 Critical: an entire subsystem built around a speculative requirement, or framework overhead dominates domain logic
+- 🟡 Warning: several unnecessary abstractions or wrapper classes, unused configuration systems
+- 🟢 Suggestion: one or two lazy classes or middle-man patterns in non-critical paths
+
+### What Not to Flag
+
+- A switch over an external protocol, wire format, or closed enum is not automatically missing polymorphism
+- Thin wrappers that absorb vendor churn or hide instability may be justified
+- A larger second version is not second-system effect unless the added generality exceeds present needs
+
+---
+
+## Risk 5: Dependency Disorder
+
+**Diagnostic question:** Do dependencies flow in a consistent, predictable direction?
+
+When business logic depends on infrastructure, infrastructure changes cascade into domain changes. Cycles prevent isolation.
+
+### Symptoms
+
+- Circular dependencies between modules or packages
+- High-level business logic directly imports from low-level infrastructure
+  (e.g., a domain service imports from a specific database driver)
+- Stable, widely-used components depend on unstable, frequently-changing ones
+- Abstract components depending on concrete implementations
+- Law of Demeter violations: `order.getCustomer().getAddress().getCity()`
+- Module fan-out greater than 5 (imports from more than 5 other modules)
+- A module implements an interface but only uses a subset of its methods, or must
+  provide stub implementations for methods it does not need (ISP violation: fat interface
+  forces unwanted dependencies on callers)
+- The system feels like "one mind did not design this" — different modules use
+  incompatible architectural patterns with no clear rule for which to use where
+- Direct version-pinned dependencies on transitive packages (diamond dependency risk);
+  upgrading one library requires coordinating multiple unrelated teams or repositories
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Dependency cycles | Martin — Clean Architecture | Acyclic Dependencies Principle (ADP) |
+| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle (DIP) |
+| Instability direction | Martin — Clean Architecture | Stable Dependencies Principle (SDP) |
+| Abstraction mismatch | Martin — Clean Architecture | Stable Abstractions Principle (SAP) |
+| ISP violation | Martin — Clean Architecture | Interface Segregation Principle (ISP) |
+| Conceptual integrity | Brooks — The Mythical Man-Month | Ch. 4: Conceptual Integrity |
+| Law of Demeter | Hunt & Thomas — The Pragmatic Programmer | Ch. 5: Decoupling and the Law of Demeter |
+| SOLID violations | Martin — Clean Architecture | Single Responsibility, Open/Closed Principles |
+| Diamond dependency / upgrade blockage | Winters et al. — Software Engineering at Google | Ch. 21: Dependency Management |
+
+### Severity Guide
+
+- 🔴 Critical: dependency cycles present, or domain layer directly depends on infrastructure layer
+- 🟡 Warning: several SDP or DIP violations but no cycles; conceptual inconsistency across modules
+- 🟢 Suggestion: minor Demeter violations, slightly elevated fan-out in isolated modules
+
+### What Not to Flag
+
+- High fan-out in an orchestration layer or composition root is not automatically disorder
+- Adapter modules may depend on both domain and infrastructure when they explicitly translate across the boundary
+- A stable facade over many leaf dependencies can be healthy if dependency policy is clear
+
+---
+
+## Risk 6: Domain Model Distortion
+
+**Diagnostic question:** Does the code faithfully represent the problem it is solving?
+
+Code that mismatches business language forces mental translation. Over time it models schemas instead of the domain, with logic bleeding into service layers.
+
+### Symptoms
+
+- Business logic scattered across service layers while domain objects have only getters and setters
+  (anemic domain model)
+- Code variable, class, or method names that do not match what business stakeholders call the concept
+- A class whose only purpose is to hold data with no behavior (pure data bag)
+- A subclass that ignores or overrides most of its parent's behavior (refuses the inheritance)
+- Bounded context boundaries crossed without any translation or anti-corruption layer
+- Methods that are more interested in the data of another class than their own
+  (domain logic in the wrong place)
+- A subclass overrides most parent methods with incompatible behavior or throws exceptions
+  where the parent contract guarantees success (LSP violation: substitution breaks callers)
+- Value Objects treated as Entities: a concept defined entirely by its attributes (e.g., Money,
+  Email, Address) is given a mutable ID and lifecycle instead of being replaced when changed
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Anemic Domain Model | Evans — Domain-Driven Design | Domain Model pattern |
+| Ubiquitous Language drift | Evans — Domain-Driven Design | Ubiquitous Language |
+| Bounded context violation | Evans — Domain-Driven Design | Bounded Context |
+| Data Class | Fowler — Refactoring | Data Class |
+| Refused Bequest | Fowler — Refactoring | Refused Bequest |
+| Feature Envy | Fowler — Refactoring | Feature Envy |
+| LSP violation | Martin — Clean Architecture | Liskov Substitution Principle (LSP) |
+
+### Severity Guide
+
+- 🔴 Critical: domain logic entirely in service layer, domain objects are pure data bags with no behavior
+- 🟡 Warning: partial anemia, some naming inconsistency between code and domain language
+- 🟢 Suggestion: minor naming drift in non-core areas, isolated cases of Feature Envy
+
+### What Not to Flag
+
+- CRUD-heavy workflows may legitimately use transaction scripts instead of rich domain objects
+- DTOs, persistence records, and API payload models are allowed to be data-only
+- Shared infrastructure language should not be mistaken for domain drift if the business model itself is simple
--- a/skills/thirdparty/_shared/remedy-guide.md
+++ b/skills/thirdparty/_shared/remedy-guide.md
@ -0,0 +1,37 @@
+# Remedy Guide — Actionable Fix Mode
+
+When `--fix` is active, enhance every finding's Remedy field to be directly actionable:
+
+## Remedy Enhancement Rules
+
+For each finding, the Remedy must include:
+1. **Target**: exact file path and function/class name
+2. **Action**: specific refactoring operation (e.g., "Extract lines 45-67 into a new
+   function `calculateShippingCost(items, config)`")
+3. **Rationale**: one sentence explaining why this specific fix (not just "refactor")
+
+## Fixability Classification
+
+Classify each finding after writing the enhanced Remedy:
+
+| Tier | Criteria | Report label |
+|------|---------|-------------|
+| Quick fix | Single-file, mechanical: rename, extract constant, reorder imports | `[quick-fix]` |
+| Guided fix | Requires a design choice: where to split, what interface shape | `[guided]` |
+| Manual | Cross-module, needs domain knowledge or team discussion | `[manual]` |
+
+Append the label to the finding title: `**R1 — Long function in OrderService [quick-fix]**`
+
+## Output Addition
+
+After the standard report, add a **Fix Summary** section:
+
+| Finding | Tier | Target File | Action |
+|---------|------|------------|--------|
+| R1 — Long function | quick-fix | src/order.ts:45 | Extract `calculateTotal()` |
+| R5 — Circular dep | manual | src/models/ ↔ src/services/ | Introduce interface boundary |
+
+## What NOT to do
+- Do NOT modify any files. Phase 1 is diagnosis + actionable plan only.
+- Do NOT generate diffs or code blocks. The Remedy text IS the deliverable.
+- Do NOT re-score. The Health Score reflects current state, not projected state.
--- a/skills/thirdparty/_shared/source-coverage.md
+++ b/skills/thirdparty/_shared/source-coverage.md
@ -0,0 +1,248 @@
+---
+books:
+  - The Mythical Man-Month
+  - Code Complete
+  - Refactoring
+  - Clean Architecture
+  - The Pragmatic Programmer
+  - Domain-Driven Design
+  - A Philosophy of Software Design
+  - Software Engineering at Google
+  - xUnit Test Patterns
+  - The Art of Unit Testing
+  - Working Effectively with Legacy Code
+  - How Google Tests Software
+---
+
+# Source Coverage Matrix
+
+Use this file after selecting a mode and before writing findings.
+It exists to prevent shallow "book-name citation" reviews.
+
+## Review Discipline
+
+- Cite a book only when the observed symptom actually matches that book's principle.
+- A threshold crossing is a hint, not a verdict. Check context, intent, and blast radius.
+- Look for justified tradeoffs before flagging a smell as debt.
+- Prefer concrete architectural or domain consequences over abstract style complaints.
+- If two books pull in different directions, state the tradeoff instead of pretending there is no tension.
+
+---
+
+## Frederick Brooks — *The Mythical Man-Month*
+
+**Encoded today**
+- Change propagation as communication overhead
+- Second-System Effect
+- Conceptual Integrity
+
+**Do not ignore**
+- Whether the design shows a single coherent idea or competing local optimizations
+- Whether cross-team coordination cost is becoming part of feature cost
+
+**Do not over-flag**
+- Large systems are not automatically second systems
+- Multi-module designs are acceptable when they preserve conceptual integrity
+
+---
+
+## Steve McConnell — *Code Complete*
+
+**Encoded today**
+- Routine length, nesting, naming, and magic numbers
+- Construction-phase YAGNI checks
+- Defensive programming and error-handling discipline (guard clauses, input validation,
+  explicit error paths, assertions for invariants)
+
+**Do not ignore**
+- Whether low-level readability choices compound into operational risk
+- Whether missing error handling makes failure modes invisible to maintainers
+
+**Do not over-flag**
+- Small, explicit guard clauses are not cognitive overload
+- A long routine may be acceptable when it is linear, well-named, and single-purpose
+
+---
+
+## Martin Fowler — *Refactoring*
+
+**Encoded today**
+- Long Method, Long Parameter List, Message Chains
+- Shotgun Surgery, Divergent Change, Feature Envy, Inappropriate Intimacy
+- Duplicate Code, Speculative Generality, Lazy Class, Middle Man, Data Class
+- Flag Arguments: boolean parameters that split a function into two behaviors
+- Primitive Obsession: domain concepts expressed as raw primitive types instead of value types
+
+**Do not ignore**
+- Whether the code smell is local or systemic
+- Whether a refactoring target has a natural home in the model
+
+**Do not over-flag**
+- Temporary duplication during an active extraction is not always debt
+- A data-focused structure is acceptable when it is intentionally a DTO or boundary record
+
+---
+
+## Robert C. Martin — *Clean Architecture*
+
+**Encoded today**
+- DIP, ADP, SDP, SAP, and layering direction
+- ISP: fat interfaces that force callers to depend on methods they do not use
+- LSP: subclasses that break the behavioral contract of their parent type
+- SRP and OCP: classes with multiple reasons to change; modules closed to modification
+  but open to extension via abstraction
+
+**Do not ignore**
+- Policy vs detail boundaries
+- Whether dependency arrows preserve replaceability and testability
+
+**Do not over-flag**
+- Composition roots may depend on concrete infrastructure by design
+- Thin adapter layers can import both directions when they are explicitly boundary glue
+
+---
+
+## Andrew Hunt & David Thomas — *The Pragmatic Programmer*
+
+**Encoded today**
+- Orthogonality
+- DRY
+- Law of Demeter
+
+**Do not ignore**
+- Whether knowledge duplication is really duplicated decision-making
+- Whether coupling is accidental or a deliberate local simplification
+
+**Do not over-flag**
+- Similar code in different bounded contexts is not automatically a DRY violation
+- Direct object access inside a cohesive aggregate is not always a Demeter problem
+
+---
+
+## Eric Evans — *Domain-Driven Design*
+
+**Encoded today**
+- Ubiquitous Language
+- Bounded Context
+- Anemic Domain Model
+- Entity vs Value Object: objects with identity and lifecycle vs. objects defined solely by
+  their attributes (Money, Email, Address should be immutable value types, not mutable entities)
+- Aggregate Roots: who owns the invariant boundary; cross-aggregate access only through the root
+
+**Do not ignore**
+- Aggregate boundaries, invariant ownership, and anti-corruption layers
+- Whether names match the business language used by experts
+
+**Do not over-flag**
+- CRUD-heavy workflows may legitimately use transaction scripts
+- Thin entities are acceptable when the domain itself is simple
+
+---
+
+## John Ousterhout — *A Philosophy of Software Design*
+
+**Encoded today**
+- Deep vs shallow modules
+- Strategic vs tactical programming
+- Information Leakage: a design decision encoded in more than one module, creating
+  change coupling even when no explicit import exists between the modules
+
+**Do not ignore**
+- Interface complexity relative to hidden complexity
+- Whether repeated tactical patches are raising long-term cognitive load
+- Whether a "helper" exposes internal design decisions that callers should not know
+
+**Do not over-flag**
+- Internal implementation complexity is fine when the interface stays simple
+- A small wrapper is acceptable when it meaningfully absorbs volatility
+
+---
+
+## Titus Winters, Tom Manshreck, Hyrum Wright — *Software Engineering at Google*
+
+**Encoded today**
+- Hyrum's Law
+- Dependency management and upgrade blockage
+- Code sustainability: whether code as written can be maintained, migrated, and upgraded
+  over a multi-year horizon without heroic effort
+- Backward compatibility: whether API changes preserve existing callers or force
+  coordinated upgrades across the organization
+
+**Do not ignore**
+- De facto APIs created by observable behavior
+- The maintenance cost of exposing too much surface area
+- Whether the dependency graph will allow independent upgrades over time
+
+**Do not over-flag**
+- A stable public API is not a liability if it is intentionally supported
+- Fan-out alone is not disorder when dependency policy is explicit and governed
+
+---
+
+## Gerard Meszaros — *xUnit Test Patterns*
+
+**Encoded today**
+- Assertion Roulette, Mystery Guest, General Fixture
+- Eager Test, Lazy Test, Test Code Duplication, Behavior Verification
+- Erratic Test: tests that produce non-deterministic results due to shared state,
+  time dependence, or ordering assumptions between tests
+
+**Do not ignore**
+- Whether test failures are diagnosable
+- Whether the suite shape amplifies maintenance cost
+
+**Do not over-flag**
+- Multiple assertions are acceptable when they express one behavior with one failure story
+- Shared fixtures are acceptable when every field is relevant to the scenario
+
+---
+
+## Roy Osherove — *The Art of Unit Testing*
+
+**Encoded today**
+- Test naming discipline
+- Test isolation
+- Mock usage guidelines
+- Completeness of edge-path tests
+
+**Do not ignore**
+- Whether tests verify behavior rather than wiring
+- Whether seams are used to simplify tests, or production code is being contorted for testability
+
+**Do not over-flag**
+- A mock is acceptable when the dependency is nondeterministic and the assertion still verifies behavior
+- Naming conventions are guidance; clarity is the goal
+
+---
+
+## Michael Feathers — *Working Effectively with Legacy Code*
+
+**Encoded today**
+- Legacy code as code without tests
+- Sensing and Separation
+- Seams
+- Characterization Tests
+
+**Do not ignore**
+- Whether the team can change a risky area safely today
+- Whether the code offers any seam for isolating behavior under change
+
+**Do not over-flag**
+- Untested code is not automatically legacy if it is stable and not under active change
+- Characterization tests are most important before modifying unclear existing behavior
+
+---
+
+## Google Engineering — *How Google Tests Software*
+
+**Encoded today**
+- Change coverage vs line coverage
+- Pyramid shape and suite portfolio economics
+
+**Do not ignore**
+- Whether the suite reflects business risk, not just percentages
+- Whether expensive tests dominate feedback loops
+
+**Do not over-flag**
+- A non-70:20:10 ratio can be healthy when justified by platform constraints or product risk
+- High coverage is useful when paired with meaningful branch and change protection
--- a/skills/thirdparty/_shared/test-decay-risks.md
+++ b/skills/thirdparty/_shared/test-decay-risks.md
@ -0,0 +1,246 @@
+# Test Decay Risk Reference
+
+Six patterns that cause test suites to degrade. Apply the Iron Law to each finding.
+
+---
+
+## Risk T1: Test Obscurity
+
+**Diagnostic question:** How much effort does it take to understand what this test verifies?
+
+Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite.
+
+### Symptoms
+
+- Assertion Roulette: multiple assertions with no message string — when one fails, it is
+  impossible to determine which behavior broke without reading every assertion
+- Mystery Guest: test depends on external state (files, database rows, shared fixtures)
+  that is not visible in the test body
+- Test names that do not express the scenario and expected outcome
+  (e.g., `test1`, `shouldWork`, `testLogin`, `testUserService`)
+- General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making
+  each test's preconditions invisible
+- Test body requires reading production code to understand what is being verified
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Assertion Roulette | Meszaros — xUnit Test Patterns | Assertion Roulette (p.224) |
+| Mystery Guest | Meszaros — xUnit Test Patterns | Mystery Guest (p.411) |
+| General Fixture | Meszaros — xUnit Test Patterns | General Fixture (p.316) |
+| Test naming | Osherove — The Art of Unit Testing | method_scenario_expected naming convention |
+
+### Severity Guide
+
+- 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages
+- 🟡 Warning: multiple Mystery Guests; several ambiguous test names
+- 🟢 Suggestion: minor naming issues; isolated General Fixture
+
+### What Not to Flag
+
+- Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story
+- Shared setup is fine when every initialized value is relevant to nearly every test
+- Concise test names are acceptable if scenario and expected outcome are still obvious
+
+---
+
+## Risk T2: Test Brittleness
+
+**Diagnostic question:** Do tests break when you refactor without changing behavior?
+
+Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite.
+
+### Symptoms
+
+- Tests assert on private method results, internal state, or implementation details
+  rather than observable behavior
+- Eager Test: one test method verifies multiple unrelated behaviors; any single change
+  causes it to fail regardless of which behavior was touched
+- Over-specified: assertions enforce mock call order or exact parameter values that are
+  irrelevant to the behavior being tested
+- Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed
+- Erratic Test: a test produces different results across runs without any change to
+  production code — caused by race conditions, time-dependent logic, random data, or
+  shared mutable state between tests
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Eager Test | Meszaros — xUnit Test Patterns | Eager Test (p.228) |
+| Erratic Test | Meszaros — xUnit Test Patterns | Erratic Test |
+| Implementation coupling | Osherove — The Art of Unit Testing | Test isolation principle |
+| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
+
+### Severity Guide
+
+- 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail
+- 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions
+- 🟢 Suggestion: isolated over-specification in non-critical tests
+
+### What Not to Flag
+
+- Verifying an externally observable event or emitted command is not implementation coupling
+- One test with several assertions is acceptable when all assertions support one behavior claim
+- A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring
+
+---
+
+## Risk T3: Test Duplication
+
+**Diagnostic question:** Is the same test scenario expressed in more than one place?
+
+Duplicated tests must change in multiple places and create false confidence without testing distinct behavior.
+
+### Symptoms
+
+- Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests
+  without extraction into a shared helper
+- Lazy Test: multiple tests verifying identical behavior with no differentiation in input,
+  state, or expected output
+- Same boundary condition tested identically at unit, integration, and E2E level —
+  three copies with no layer differentiation
+- Test helper functions or fixtures duplicated across test files instead of shared
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Test Code Duplication | Meszaros — xUnit Test Patterns | Test Code Duplication (p.213) |
+| Lazy Test | Meszaros — xUnit Test Patterns | Lazy Test (p.232) |
+| DRY violation in tests | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
+
+### Severity Guide
+
+- 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation
+- 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction
+- 🟢 Suggestion: minor helper duplication; isolated Lazy Tests
+
+### What Not to Flag
+
+- The same scenario may appear at unit and integration level when each layer verifies a distinct risk
+- Small local setup duplication can be clearer than an over-abstracted fixture maze
+- Similar assertions against different domain rules are not Lazy Tests if the business intent differs
+
+---
+
+## Risk T4: Mock Abuse
+
+**Diagnostic question:** Is the test more complex than the behavior it tests?
+
+Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up.
+
+### Symptoms
+
+- Mock setup code is longer than the test logic itself
+- Primary assertion is `expect(mock).toHaveBeenCalledWith(...)` — the test verifies
+  that a mock was called, not that any real behavior occurred
+- Test-only methods added to production classes for lifecycle management in tests
+- Single unit test uses more than 3 mocks
+- Incomplete Mock: mock object missing fields that downstream code will access,
+  causing silent failures only visible in integration
+- Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Mock count > 3 | Osherove — The Art of Unit Testing | Mock usage guidelines |
+| Testing mock behavior | Meszaros — xUnit Test Patterns | Behavior Verification (p.544) |
+| Test-only production methods | Feathers — Working Effectively with Legacy Code | Ch. 3: Sensing and Separation |
+| Hard-Coded Test Data | Meszaros — xUnit Test Patterns | Hard-Coded Test Data (p.534) |
+| Incomplete Mock | Osherove — The Art of Unit Testing | Mock completeness requirement |
+
+### Severity Guide
+
+- 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests
+- 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications
+- 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data
+
+### What Not to Flag
+
+- A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior
+- Fakes and spies used to observe state transitions are not mock abuse by default
+- One interaction assertion may be appropriate when the interaction itself is the behavior under test
+
+---
+
+## Risk T5: Coverage Illusion
+
+**Diagnostic question:** Does the test suite actually protect against the failures that matter?
+
+Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered."
+
+### Symptoms
+
+- High line coverage but error-handling branches, boundary conditions, and exception paths
+  have no corresponding tests
+- Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases
+- Legacy code areas are being actively modified with no tests present
+  (Feathers: "legacy code is code without tests")
+- Coverage percentage treated as a sign-off criterion; critical change paths remain untested
+- Tests assert on return values but not on important side effects such as database writes,
+  event publications, or state transitions
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Legacy code = no tests | Feathers — Working Effectively with Legacy Code | Ch. 1: "Legacy code is code without tests" |
+| Change coverage vs line coverage | Google — How Google Tests Software | Ch. 11: Testing at Google Scale |
+| Happy-path only | Osherove — The Art of Unit Testing | Test completeness principle |
+
+### Severity Guide
+
+- 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent
+- 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent
+- 🟢 Suggestion: a few non-critical paths missing sad-path tests
+
+### What Not to Flag
+
+- High line coverage is useful when paired with branch, boundary, and change-path coverage
+- A new module may have limited coverage early if it is still private and low-risk
+- Side-effect assertions may live in integration tests rather than unit tests without implying a gap
+
+---
+
+## Risk T6: Architecture Mismatch
+
+**Diagnostic question:** Does the test suite structure reflect the system's actual risk profile?
+
+Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer.
+
+### Symptoms
+
+- Inverted test pyramid: E2E or integration test count exceeds unit test count,
+  causing a slow and fragile suite
+- Legacy code with no seam points: no interfaces, dependency injection, or seams exist,
+  making it impossible to test in isolation without modifying production code
+- Legacy areas being modified have no Characterization Tests to capture current behavior
+  before changes are made
+- Full suite execution time exceeds 10 minutes (indicates architectural problem,
+  not a performance problem — too many slow tests)
+- High-risk and low-risk paths are tested at identical density;
+  no risk-based prioritization in test distribution
+
+### Sources
+
+| Symptom | Book | Principle / Smell |
+|---------|------|-------------------|
+| Inverted pyramid | Google — How Google Tests Software | 70:20:10 unit:integration:E2E ratio |
+| No seam points | Feathers — Working Effectively with Legacy Code | Ch. 4: Seam Model |
+| Missing Characterization Tests | Feathers — Working Effectively with Legacy Code | Ch. 13: Characterization Tests |
+| Suite execution time | Meszaros — xUnit Test Patterns | Slow Tests (p. 253) |
+
+### Severity Guide
+
+- 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted
+- 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests
+- 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests
+
+### What Not to Flag
+
+- Deviating from 70:20:10 can be justified by platform constraints or product risk
+- A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered
+- A small number of critical-path E2E tests is desirable, not a smell
--- a/skills/thirdparty/brooks-audit/SKILL.md
+++ b/skills/thirdparty/brooks-audit/SKILL.md
@ -0,0 +1,42 @@
+---
+name: brooks-audit
+description: >
+  Architecture audit that maps module dependencies, checks layering integrity, and
+  flags structural decay across a codebase, drawing on twelve classic engineering books.
+  Triggers when: user asks to audit architecture, review folder/module structure,
+  check for circular imports, understand how the codebase is organized, or asks
+  "does this follow clean architecture?", "why does everything depend on everything?",
+  "are our layers correct?", "where should this code live?".
+  Also triggers for onboarding requests: "explain this codebase to a new developer"
+  or "give me a codebase tour" (use onboarding mode).
+  Do NOT trigger for: PR-level code review (use brooks-review) or line-level refactoring
+  questions — this skill analyzes structural/module-level concerns, not individual functions.
+---
+
+# Brooks-Lint — Architecture Audit
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
+4. Read `architecture-guide.md` in this directory for the audit framework
+
+## Process
+
+**Onboarding mode:** If the user asks for an onboarding report, codebase tour, or
+"explain this codebase to a new developer", read `onboarding-guide.md` from this
+directory and follow it instead of `architecture-guide.md`. This mode explains rather
+than diagnoses — no Health Score, no Iron Law findings.
+
+**If the user has not specified files or a directory to audit:** apply Auto Scope
+Detection from `../_shared/common.md` to determine the audit scope before proceeding.
+
+1. Gather codebase context and draw the module dependency graph as Mermaid (Steps 0–1 of the guide)
+2. Scan for each decay risk in the order specified (Steps 2–4 of the guide)
+3. Assign node colors in the Mermaid diagram based on findings (red/yellow/green) — after Step 4
+4. Run the Testability Seam Assessment (Step 5 of the guide)
+5. Run the Conway's Law check (Step 6 of the guide)
+6. Output using the Report Template from common.md — Mermaid graph FIRST, then Findings
+
+**Mode line in report:** `Architecture Audit`
--- a/skills/thirdparty/brooks-audit/architecture-guide.md
+++ b/skills/thirdparty/brooks-audit/architecture-guide.md
@ -0,0 +1,195 @@
+# Architecture Audit Guide — Mode 2
+
+**Purpose:** Analyze the module and dependency structure of a system for decay risks that
+operate at the architectural level. Every finding must follow the Iron Law:
+Symptom → Source → Consequence → Remedy.
+
+**Monorepo note:** Treat each deployable service or library as a top-level module. Draw
+dependencies between services, not between their internal packages. Apply the Conway's Law
+check at the service ownership level. Within a single service, apply standard module-level analysis.
+
+---
+
+## Analysis Process
+
+Work through these six steps in order.
+
+### Step 0: Gather Codebase Context
+
+Before drawing anything, establish what you can see.
+
+**If the user provided a full directory tree or pasted relevant file contents:** skip the
+proactive reading below and proceed to Step 1.
+
+**Otherwise, proactively read the project using these tools:**
+
+1. **Top-level structure** — glob top two levels to identify module boundaries:
+   ```
+   Glob: **/*(depth 2, directories only)
+   ```
+2. **Entry points** — read the package manifest or main config file (e.g., `package.json`,
+   `go.mod`, `pom.xml`, `Cargo.toml`, `pyproject.toml`) to confirm language, framework,
+   and declared dependencies.
+3. **Dependency edges** — grep import statements to discover inter-module calls. Run once
+   per language present; limit to the first 200 matches to avoid token overrun:
+   ```
+   Grep: "^\s*(import|from|require\(|use )" across *.ts|*.py|*.go|*.rs|*.java
+   ```
+4. **Large modules** — for any top-level directory with > 10 files, read the file matching
+   `index.*`, `main.*`, or `__init__.*` to understand its stated responsibility.
+
+**Stop when you can answer all three:**
+- What are the top-level modules (names and count)?
+- Which modules import from which other modules?
+- Which module has the highest fan-in or fan-out?
+
+If the project has > 100 top-level files or > 4 levels of nesting, note which areas were
+sampled vs. inferred, and flag this in the report scope line.
+
+### Step 1: Draw the Module Dependency Graph (Mermaid)
+
+Before evaluating any risk, map the dependencies as a Mermaid diagram. Use this format:
+
+````mermaid
+graph TD
+  subgraph UI
+    WebApp
+    MobileApp
+  end
+
+  subgraph Domain
+    AuthService
+    OrderService
+    PaymentService
+  end
+
+  subgraph Infrastructure
+    Database
+    MessageQueue
+  end
+
+  WebApp --> AuthService
+  WebApp --> OrderService
+  MobileApp --> AuthService
+  MobileApp --> OrderService
+  OrderService --> PaymentService
+  OrderService --> Database
+  OrderService --> MessageQueue
+  PaymentService --> Database
+  AuthService -.->|circular| OrderService
+
+  classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff
+  classDef warning fill:#ffd43b,stroke:#e67700
+  classDef clean fill:#51cf66,stroke:#2b8a3e,color:#fff
+
+  class PaymentService critical
+  class OrderService warning
+  class Database,MessageQueue,AuthService,WebApp,MobileApp clean
+````
+
+Draw the graph structure first — nodes, subgraphs, and edges — without any `classDef` or
+`class` lines. You cannot assign colors until you have completed the risk scan in Steps 2–4.
+
+**After completing Step 4**, return to this graph and add the `classDef` and `class` lines
+based on findings. The example above shows the final colored output.
+
+Rules:
+1. **Nodes** — Use top-level directories or services as nodes, not individual files
+2. **Grouping** — One `subgraph` per architectural layer or top-level directory (e.g., UI, Domain, Infrastructure)
+3. **Edges** — Solid arrows (`-->`) point FROM the depending module TO the dependency; use dotted arrows with label (`-.->|circular|`) for circular dependencies. If no circular dependencies exist, use only solid arrows
+4. **Node limit** — Keep the graph to ~50 nodes maximum; collapse low-risk leaf modules into their parent if needed
+5. **Fan-out** — For any node with fan-out > 5, use a descriptive label: `HighFanOutModule["ModuleName (fan-out: 7)"]`
+6. **Colors** — Apply `classDef` colors AFTER completing Steps 2-4: `critical` (red `#ff6b6b`) for nodes with Critical findings, `warning` (yellow `#ffd43b`) for Warning findings, `clean` (green `#51cf66`) for nodes with no findings or only Suggestions. If no findings at all, classify all nodes as `clean`
+7. **Direction** — Default to `graph TD` (top-down); use `graph LR` only if the architecture is clearly a left-to-right pipeline
+
+### Step 2: Scan for Dependency Disorder
+
+*The most architecturally consequential risk — scan this first.*
+
+Look for:
+- Circular dependencies (any `-.->|circular|` edge in the map above)
+- Arrows flowing upward (high-level domain depending on low-level infrastructure)
+- Stable, widely-depended-on modules that import from frequently-changing modules
+- Modules with fan-out > 5
+- Absence of a clear layering rule (no consistent answer to "what depends on what?")
+
+### Step 3: Scan for Domain Model Distortion
+
+Look for:
+- Do module names match the business domain vocabulary?
+- Is there a layer called "services" that contains all the business logic while domain objects
+  are pure data structures?
+- Are there modules that cross bounded context boundaries (e.g., billing logic in the user module)?
+- Is there an anti-corruption layer where external systems interface with the domain?
+
+### Step 4: Scan for Remaining Four Risks
+
+Check each in turn:
+
+**Knowledge Duplication:**
+- Are there multiple modules implementing the same concept independently?
+- Does the same domain concept appear under different names in different modules?
+
+**Accidental Complexity:**
+- Are there entire layers in the architecture that do not add value?
+- Are there modules whose responsibility cannot be stated in one sentence?
+
+**Change Propagation:**
+- Which modules are "blast radius hotspots"? (A change here requires changes in many other modules)
+- Does the dependency map reveal why certain features are slow to develop?
+
+**Cognitive Overload:**
+- Can the module responsibility of each module be stated in one sentence from its name alone?
+- Would a new developer know which module to add a new feature to?
+
+### Step 5: Testability Seam Assessment
+
+A *seam* is a place in the architecture where behavior can be altered without editing source
+code — typically an interface, a configuration point, or a dependency injection boundary.
+Seam density is a proxy for testability and evolvability.
+
+Scan for:
+- **No seam at the infrastructure boundary**: can you replace a real database, file system,
+  or HTTP client with a test double without editing the module under test? If not, the
+  architecture forces integration tests where unit tests would suffice.
+- **Seam collapse**: a module that was once testable in isolation has had its seams removed
+  (e.g., direct constructor instantiation replaced a dependency injection point, or a global
+  singleton replaced an injected collaborator).
+- **Missing seam in legacy areas**: modules without an obvious injection point or interface
+  boundary — any change requires touching the entire call stack to substitute behavior.
+
+If all modules have clear seams at their infrastructure boundaries → no finding.
+
+If seams are absent or collapsed: flag as 🟡 Warning with a Remedy pointing to the specific
+module and the injection point that needs to be restored or introduced.
+
+Source: Feathers — Working Effectively with Legacy Code, Ch. 4: The Seam Model
+
+### Step 6: Conway's Law Check
+
+After the six-risk scan, assess the relationship between architecture and team structure:
+
+- Does the module/service structure reflect the team structure?
+  (Conway's Law: "Organizations design systems that mirror their communication structure")
+- If yes: is this intentional design or accidental coupling?
+- A mismatch that causes cross-team coordination overhead for every feature is 🔴 Critical.
+- A mismatch that is theoretical but not yet causing pain is 🟡 Warning.
+- If team structure is unknown, note this as context missing and skip the check.
+
+**Calibration examples:**
+- 🔴 Critical: the Payments module is owned by Team A but contains auth logic owned by Team B —
+  every Payments change requires a sync meeting with Team B
+- 🟡 Warning: two separate teams own the `utils/` and `helpers/` directories which do the same
+  things — theoretically painful but not yet causing release coordination issues
+- Not a finding: a single team owns a monorepo with multiple logical modules — Conway's Law
+  misalignment requires *separate teams* to be meaningful
+
+---
+
+## Output
+
+Use the standard Report Template from `../_shared/common.md`. Mode: Architecture Audit.
+
+Place the Mermaid dependency graph FIRST under "Module Dependency Graph". Reference
+relevant node names in findings. Add `classDef` color assignments LAST, after all
+findings are identified.
--- a/skills/thirdparty/brooks-audit/onboarding-guide.md
+++ b/skills/thirdparty/brooks-audit/onboarding-guide.md
@ -0,0 +1,89 @@
+# Codebase Onboarding Guide
+
+**Purpose:** Produce a newcomer-friendly tour of the codebase. This is NOT a diagnostic
+report — no Health Score, no Iron Law findings. Focus on explanation and orientation.
+
+---
+
+## Process
+
+### Step 1: Map the Territory
+
+- Read top-level structure (same as architecture-guide Step 0)
+- Output: a plain-language overview of what each top-level module does (one sentence each)
+- Group into layers: "Things users interact with", "Business logic", "Infrastructure"
+
+### Step 2: Draw the Dependency Map
+
+Draw the same Mermaid dependency graph as architecture audit Step 1, but color nodes by
+**recommended reading order** using a DISTINCT palette from the severity palette
+(which uses red/yellow/green). This avoids confusing "red = danger" with "red = read last":
+
+- 🔵 Blue (`#339af0`): start here — entry points, core domain
+- 🟣 Purple (`#9775fa`): read next — supporting modules
+- ⚪ Gray (`#ced4da`): read last — infrastructure, generated code, utilities
+
+Add numbered labels: `CoreModule["1. CoreModule"]`
+
+```
+classDef start fill:#339af0,color:#fff
+classDef next fill:#9775fa,color:#fff
+classDef last fill:#ced4da
+```
+
+### Step 3: Highlight Key Conventions
+
+Identify and document patterns the codebase follows:
+- Naming conventions (file naming, class naming, variable naming)
+- Directory organization pattern (feature-based? layer-based? hybrid?)
+- Error handling pattern (exceptions? result types? error codes?)
+- Testing convention (co-located? separate directory? naming pattern?)
+- Dependency injection pattern (if any)
+
+### Step 4: Mark Danger Zones
+
+For each module with known complexity or coupling issues, add a brief warning:
+- "OrderService: high complexity, only modify with full test suite running"
+- "legacy/: no tests, use Characterization Tests before changing"
+
+Do NOT use Iron Law format — use plain warnings. This is orientation, not diagnosis.
+
+### Step 5: Build a Domain Glossary
+
+Extract 10-15 key domain terms from code (class names, method names, constants) and map
+them to plain-language definitions. This applies Evans's Ubiquitous Language as documentation.
+
+### Step 6: Suggest First Tasks
+
+Based on the dependency map, suggest 2-3 low-risk areas where a new developer could make
+their first contribution: modules with good test coverage, clear boundaries, low coupling.
+
+---
+
+## Output Template
+
+```
+# Codebase Tour: [Project Name]
+
+## Overview
+[2-3 sentence summary of what the project does and its tech stack]
+
+## Module Map
+[Mermaid graph with reading-order colors]
+
+## Module Guide
+[One paragraph per top-level module: what it does, what it depends on, key files to read]
+
+## Conventions
+[Bullet list of patterns this codebase follows]
+
+## Danger Zones
+[Bullet list of areas to be careful with, or "None identified" if the codebase is clean]
+
+## Domain Glossary
+| Term | Meaning |
+|------|---------|
+
+## Suggested First Tasks
+[2-3 concrete suggestions for a new developer's first PR]
+```
--- a/skills/thirdparty/brooks-debt/SKILL.md
+++ b/skills/thirdparty/brooks-debt/SKILL.md
@ -0,0 +1,35 @@
+---
+name: brooks-debt
+description: >
+  Tech debt assessment that identifies, classifies, and prioritizes maintainability
+  problems — helping teams build a refactoring roadmap — drawing on twelve classic
+  engineering books.
+  Triggers when: user asks about tech debt, refactoring priorities, what to clean up
+  first, or asks "why is this so hard to change?", "where's the most painful part?",
+  "what should we fix first?", "how do I justify refactoring to management?",
+  "why is our velocity dropping?".
+  Do NOT trigger for: server health checks, HTTP /health endpoints, Kubernetes probes,
+  database health, or application uptime — "health" in those contexts is infrastructure,
+  not code quality. Also not for single-function refactoring questions.
+---
+
+# Brooks-Lint — Tech Debt Assessment
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
+4. Read `debt-guide.md` in this directory for the debt classification framework
+
+## Process
+
+**If the user has not described the codebase or pointed to specific areas:** apply Auto
+Scope Detection from `../_shared/common.md` to determine the assessment scope before proceeding.
+
+1. Scan for all six decay risks (Step 1 of the guide); list every finding before scoring
+2. Apply the Pain × Spread priority formula and classify debt intent (Steps 2–3 of the guide)
+3. Group findings by decay risk (Step 4 of the guide)
+4. Output using the Report Template from common.md, plus the Debt Summary Table
+
+**Mode line in report:** `Tech Debt Assessment`
--- a/skills/thirdparty/brooks-debt/debt-guide.md
+++ b/skills/thirdparty/brooks-debt/debt-guide.md
@ -0,0 +1,125 @@
+# Tech Debt Assessment Guide — Mode 3
+
+**Purpose:** Identify, classify, and prioritize technical debt across the entire codebase.
+Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
+
+---
+
+## Evidence Gathering
+
+If you have insufficient evidence to assess the codebase, ask the user ONE question —
+choose the single question most relevant to what you already know:
+
+1. "Which part of the codebase takes the longest to modify for a typical feature?"
+2. "Which module do developers avoid touching, and why?"
+3. "Which parts of the system have the fewest tests and the most bugs?"
+4. "Is there a module that only one person fully understands?"
+
+After one answer, proceed. Do not ask more than one question.
+If the user declines or says they don't know, proceed with available evidence and note
+which areas could not be assessed.
+
+---
+
+## Analysis Process
+
+Work through these four steps in order.
+
+### Step 1: Full Decay Risk Scan
+
+Scan for all six decay risks across the entire codebase. List every finding before scoring
+any of them. This prevents anchoring on early findings and missing systemic patterns.
+
+For each risk, look for:
+
+**Cognitive Overload:** Are there widespread naming problems, deeply nested logic, or
+excessively long functions spread across many modules?
+
+**Change Propagation:** Which modules cause the most ripple effects when changed?
+Are there modules that everyone must modify when adding a new feature?
+
+**Knowledge Duplication:** How many times is the same concept implemented independently?
+Is the domain vocabulary consistent across the codebase?
+
+**Accidental Complexity:** Are there architectural layers or abstractions that add no value?
+Is the infrastructure overhead proportional to the problem being solved?
+
+**Dependency Disorder:** Are there dependency cycles? Does domain logic depend on infrastructure?
+Are there modules with no clear layering position?
+
+**Domain Model Distortion:** Is business logic in the right layer?
+Do code names match business names? Are domain objects anemic?
+
+### Step 2: Score Each Finding with Pain × Spread
+
+After listing all findings, score each one:
+
+**Pain score (1–3):** How much does this slow down development today?
+- 3: Developers actively avoid touching this area; it causes bugs on most changes
+  *(e.g., "nobody wants to touch the billing module because it always breaks something")*
+- 2: This area is noticeably slower to work in than the rest of the codebase
+  *(e.g., "adding a field takes 2–3x longer here than elsewhere")*
+- 1: This is a quality issue but not currently causing active pain
+  *(e.g., "inconsistent naming, but we always know what we mean")*
+
+**Spread score (1–3):** How many files, modules, or developers does this affect?
+- 3: Affects 5+ modules or all developers on the team
+  *(e.g., "every new feature touches the God class in core/")*
+- 2: Affects 2–4 modules or a subset of the team
+  *(e.g., "the auth and notification modules are tightly coupled")*
+- 1: Isolated to one module or one developer's area
+  *(e.g., "legacy parser that only one person maintains")*
+
+**Priority = Pain × Spread** (max 9)
+
+| Priority | Classification | Action |
+|----------|---------------|--------|
+| 7–9 | Critical debt | Address in next sprint |
+| 4–6 | Scheduled debt | Plan within quarter |
+| 1–3 | Monitored debt | Log and watch |
+
+### Step 3: Classify Debt Intent
+
+After scoring, classify each finding as intentional or accidental:
+
+**Intentional debt** — a conscious shortcut taken to meet a deadline, with the expectation
+of paying it back. The team knows about it. It may be legitimate (a strategic prototype,
+a known temporary workaround during a migration).
+
+**Accidental debt** — degradation that accumulated without a deliberate decision: the team
+did not choose it and may not even know it exists. This is the kind Ward Cunningham's
+original definition warned against — not a tactical trade-off, but structural erosion.
+
+Mark each finding with `[intentional]` or `[accidental]` in the Debt Summary Table.
+Intentional debt with no visible payback plan — no linked ticket, no code comment, no
+documented decision — should be treated as accidental for prioritization purposes.
+Focus remediation energy on accidental debt first; intentional debt at least has an owner.
+
+### Step 4: Group by Decay Risk
+
+Report findings grouped by risk type, not by file or module.
+Grouping by risk reveals systemic patterns:
+- "Change Propagation is systemic" → architectural intervention needed
+- "Cognitive Overload is isolated" → localized refactoring sufficient
+
+---
+
+## Output
+
+Use the standard Report Template from `../_shared/common.md`. Mode: Tech Debt Assessment.
+
+After Findings, append a Debt Summary Table:
+
+```
+## Debt Summary
+| Risk | Findings | Avg Priority | Classification | Intent |
+|------|----------|-------------|----------------|--------|
+| Cognitive Overload      | N | X.X | Monitored/Scheduled/Critical | intentional/accidental |
+| Change Propagation      | N | X.X | ... | ... |
+| Knowledge Duplication   | N | X.X | ... | ... |
+| Accidental Complexity   | N | X.X | ... | ... |
+| Dependency Disorder     | N | X.X | ... | ... |
+| Domain Model Distortion | N | X.X | ... | ... |
+
+**Recommended focus:** [risks with highest average priority]
+```
--- a/skills/thirdparty/brooks-health/SKILL.md
+++ b/skills/thirdparty/brooks-health/SKILL.md
@ -0,0 +1,37 @@
+---
+name: brooks-health
+description: >
+  Combined codebase health dashboard that scores a project across all four quality
+  dimensions — PR quality, architecture, tech debt, and test quality — in a single
+  pass, drawing on twelve classic engineering books.
+  Triggers when: user wants an overall quality assessment, asks "how healthy is this
+  codebase?", "run all the checks", "give me a big-picture quality report", "I need a
+  health score before the release", "what's the overall state of our code?", or wants
+  to onboard a new team with a quality overview.
+  Do NOT trigger for: server health checks, HTTP health endpoints, Kubernetes
+  liveness/readiness probes, database health, or application uptime. Also do not
+  trigger when the user specifically requests only one dimension — use the
+  corresponding focused skill instead (brooks-review / brooks-audit /
+  brooks-debt / brooks-test).
+---
+
+# Brooks-Lint — Health Dashboard
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/decay-risks.md` for production risk symptom definitions
+4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
+5. Read `health-guide.md` in this directory for the dashboard orchestration process
+
+## Process
+
+**If the user has not specified a project or directory:** apply Auto Scope Detection
+from `../_shared/common.md` to determine the review scope before proceeding.
+
+1. Run abbreviated scans across all four dimensions (Step 1 of the guide)
+2. Compute per-dimension and composite Health Scores with weighting (Step 2 of the guide)
+3. Output the Health Dashboard using the dashboard report template (Step 3 of the guide)
+
+**Mode line in report:** `Health Dashboard`
--- a/skills/thirdparty/brooks-health/health-guide.md
+++ b/skills/thirdparty/brooks-health/health-guide.md
@ -0,0 +1,89 @@
+# Health Dashboard Guide — Mode 5
+
+**Purpose:** Produce a cross-dimensional health dashboard for the codebase.
+Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
+
+---
+
+## Analysis Process
+
+### Step 1: Run Lightweight Scan Across Four Dimensions
+
+For each dimension, run an abbreviated scan using the decay-risks definitions
+from `_shared/`. Do NOT read the individual mode guide files — use the abbreviated
+checklists below. Cap each dimension at 3 findings; for Debt: cap at 2 per risk code and 3 across all risk codes.
+
+**PR dimension (if changes exist):**
+- Apply Auto Scope Detection (common.md)
+- Scan for R2 (Change Propagation) and R1 (Cognitive Overload) in the diff
+
+**Architecture dimension:**
+- Gather codebase context: read top-level structure, entry points, import statements
+- Draw a Mermaid dependency graph (follow standard graph rules from common.md)
+- Scan for R5 (Dependency Disorder): circular deps, upward flows, fan-out > 5
+- INCLUDE the Mermaid graph in output
+
+**Debt dimension:**
+- Scan for all six decay risks (R1-R6) across the codebase
+- Skip Pain × Spread scoring (use severity tier only)
+
+**Test dimension:**
+- Build the Test Suite Map (unit/integration/E2E counts)
+- Scan for T1 (Test Obscurity) and T2 (Test Brittleness) in test files
+
+### Step 2: Compute Dashboard Scores
+
+Each dimension gets its own Health Score (base 100, same deduction rules from common.md).
+Composite score = weighted average of dimension scores:
+
+| Dimension | Weight | Rationale |
+|-----------|--------|-----------|
+| PR (code quality) | 0.25 | Only applies if changes exist; skip if no diff |
+| Architecture | 0.30 | Structural issues have highest blast radius |
+| Debt | 0.25 | Systemic but slower-moving |
+| Test | 0.20 | Supporting signal |
+
+If PR dimension is skipped (no changes), redistribute its 0.25 weight proportionally
+across the remaining three dimensions by dividing each remaining weight by
+(1 − 0.25) = 0.75. Compute redistribution dynamically — do not hardcode the values.
+
+**Redistributed weights (PR skipped):**
+
+| Dimension | Base Weight | Redistributed Weight |
+|-----------|------------|---------------------|
+| Architecture | 0.30 | 0.30 / 0.75 = 0.40 |
+| Debt | 0.25 | 0.25 / 0.75 = 0.33 |
+| Test | 0.20 | 0.20 / 0.75 = 0.27 |
+
+### Step 3: Output Dashboard
+
+Use the dashboard report template below instead of the standard common.md template.
+
+---
+
+## Dashboard Report Template
+
+````markdown
+# Brooks-Lint Health Dashboard
+
+**Mode:** Health Dashboard
+**Scope:** [project name or directory]
+**Composite Score:** XX/100
+
+| Dimension | Score | Top Finding |
+|-----------|-------|------------|
+| Code Quality | XX/100 | [one-line summary or "Clean"] |
+| Architecture | XX/100 | [one-line summary or "Clean"] |
+| Tech Debt | XX/100 | [one-line summary or "Clean"] |
+| Test Quality | XX/100 | [one-line summary or "Clean"] |
+
+## Module Dependency Graph
+[Mermaid graph from architecture scan]
+
+## Top Findings (max 5 across all dimensions)
+[Standard Iron Law format, sorted by severity]
+
+## Recommendation
+[One paragraph: what to fix first, which dimension needs the most attention,
+ suggest running the full individual skill for the worst dimension]
+````
--- a/skills/thirdparty/brooks-review/SKILL.md
+++ b/skills/thirdparty/brooks-review/SKILL.md
@ -0,0 +1,37 @@
+---
+name: brooks-review
+description: >
+  PR code review that surfaces decay risks, design smells, and maintainability
+  issues with concrete Symptom → Source → Consequence → Remedy findings, drawing
+  on twelve classic engineering books.
+  Triggers when: user asks to review code, check a PR, shares a diff or pastes
+  code asking "does this look right?" / "any issues here?" / "ready to merge?",
+  or asks for feedback on a function, class, or file.
+  Also triggers when user mentions: code smells / refactoring / clean architecture /
+  DDD / domain-driven design / SOLID principles / Hyrum's Law / deep modules /
+  tactical programming / conceptual integrity / Brooks's Law / Mythical Man-Month /
+  second system effect.
+  Do NOT trigger for: questions about how to write code from scratch, language syntax
+  questions, or framework/tool questions where no existing code is shared.
+---
+
+# Brooks-Lint — PR Review
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
+4. Read `pr-review-guide.md` in this directory for the analysis process
+
+## Process
+
+**If the user has not specified files or pasted code:** apply Auto Scope Detection
+from `../_shared/common.md` to determine the review scope before proceeding.
+
+1. Understand the review scope, then scan for each decay risk in the order specified (Steps 1–6 of the guide)
+2. Run the Quick Test Check (Step 7 of the guide) — skip for docs-only or non-production changes
+3. Apply the Iron Law to every finding
+4. Output using the Report Template from common.md
+
+**Mode line in report:** `PR Review`
--- a/skills/thirdparty/brooks-review/pr-review-guide.md
+++ b/skills/thirdparty/brooks-review/pr-review-guide.md
@ -0,0 +1,163 @@
+# PR Review Guide — Mode 1
+
+**Purpose:** Analyze a code diff or specific files for decay risks that are directly visible
+in the changed code. Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
+
+---
+
+## Before You Start
+
+**Auto-generated files:** If the diff contains generated files (protobuf stubs, OpenAPI clients,
+ORM migrations, lock files, minified bundles), skip those files entirely. Generated code reflects
+tool choices, not developer decisions. Note in the report which files were skipped and why.
+
+**Scope calibration:** Adjust analysis depth based on PR size before starting.
+
+| PR Size | Approach |
+|---------|----------|
+| < 50 lines | Focus on Steps 1–3 only; run Step 6a only if imports changed; run Step 6b if any class, method, or variable was renamed or introduced |
+| 50–300 lines | Full process, all steps |
+| > 300 lines | Full process; note in the Scope line that review is sampled — cover the highest-risk areas rather than every file |
+
+For PRs > 500 lines: flag in the Summary that a PR this size is itself a Change Propagation signal. A change that cannot be reviewed in one pass suggests tangled responsibilities.
+
+---
+
+## Analysis Process
+
+Work through these seven steps in order. Do not skip steps.
+
+### Step 1: Understand the scope
+
+Read the diff or files and answer:
+- What is the stated purpose of this change?
+- Which files were modified?
+- Flag immediately if the PR changes more than 10 unrelated files — that itself is a
+  🟡 Warning: Change Propagation (a PR that touches many unrelated things is a sign
+  that responsibilities are tangled).
+
+### Step 2: Scan for Change Propagation
+
+*Scan this first — it is the most visible risk in a diff.*
+
+Look for:
+- Does this change touch files in modules that have no conceptual connection to the stated purpose?
+- Does any modified class change for more than one business reason in this diff?
+- Does any method use more data from another class than from its own?
+
+If the diff shows no cross-module changes beyond what the feature requires → skip, no finding.
+
+### Step 3: Scan for Cognitive Overload
+
+Look for:
+- Are any new or modified functions longer than 20 lines?
+- Is there nesting deeper than 3 levels in new or modified code?
+- Are there more than 4 parameters in any new function signature?
+- Are there magic numbers or unexplained constants in new code?
+- Do new variable or function names require reading the implementation to understand?
+- Are there train-wreck chains (3+ method calls chained)?
+
+### Step 4: Scan for Knowledge Duplication
+
+Look for:
+- Does this change introduce logic that already exists elsewhere in the codebase?
+- Does this change introduce a new name for a concept that already has a name?
+- Does this change add a class to a hierarchy that has a parallel in another module?
+
+### Step 5: Scan for Accidental Complexity
+
+Look for:
+- Does this change add an abstraction with only one concrete use?
+- Does this change add a class that only wraps another class or delegates everything?
+- Does this change add configuration options or extension points that serve no current requirement?
+
+### Step 6a: Scan for Dependency Disorder
+
+- Do any new imports create a dependency from a high-level module to a low-level one?
+  (e.g., domain service now imports a database driver or HTTP client)
+- Do any new imports introduce a cycle between modules?
+- Does any new interface force callers to depend on methods they do not use?
+
+If no new imports and no structural changes → skip, no finding.
+
+### Step 6b: Scan for Domain Model Distortion
+
+- Do new class or variable names match the language the business uses for the same concept?
+- Does any new class hold only data with no behavior (pure data bag), where behavior was expected?
+- Does any new method put logic that belongs to the domain in a service or utility layer?
+
+---
+
+## Severity Calibration
+
+Apply the Iron Law format from `../_shared/common.md`. Each risk in `../_shared/decay-risks.md` has its own Severity
+Guide with numeric thresholds — use those as the primary reference. When a finding sits
+on the boundary between two tiers, use this as a tiebreaker:
+- 🔴 Critical — actively breaking velocity or creating production risk *today*
+- 🟡 Warning — will if left unaddressed through the next few features
+- 🟢 Suggestion — worth fixing when nearby, not urgent
+
+When multiple findings exist, list Critical items first. If there are more than 5 findings,
+add a one-line "Recommended fix order" at the end of the Findings section.
+
+---
+
+## Step 7: Quick Test Check
+
+*Run this last. Three signals only — this is not a full Mode 4 review.*
+
+If the diff contains only generated files, configuration, or documentation with no
+production logic changes → skip Step 7 entirely.
+
+**Signal 1: Do tests exist for the changed behavior?**
+
+- Does the diff modify production code?
+- Are corresponding test file changes included in the diff?
+- If new public behavior was added with no new tests:
+  → 🟡 Warning: Coverage Illusion — new behavior is untested
+  → Source: Feathers — Working Effectively with Legacy Code, Ch. 1
+- If the change is a pure refactor and existing tests cover the behavior → no finding.
+
+**Signal 2: Quick Mock Abuse sniff**
+
+Only check if the diff includes test file changes.
+
+- Is mock setup code in new/modified tests obviously longer than the test logic?
+- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` with no behavior verification?
+- Does the diff add any methods to production classes that are only called from test files?
+
+If any of these are true:
+  → 🟡 Warning: Mock Abuse — test complexity exceeds behavior complexity
+  → Source: Osherove — The Art of Unit Testing, mock usage guidelines
+
+**Signal 3: Quick Test Obscurity sniff**
+
+Only check if the diff includes test file changes.
+
+- Do new test names express scenario and expected outcome?
+  (Pattern: `methodName_scenario_expectedResult` or equivalent)
+- Are there new tests with multiple assertions and no message strings on any of them?
+
+If test names are vague or assertions lack messages:
+  → 🟢 Suggestion: Test Obscurity — test intent is unclear from the test name or assertions
+  → Source: Meszaros — xUnit Test Patterns, Assertion Roulette (p.224)
+
+**Output rule:**
+
+If all three signals are clean → write no Test findings. Proceed directly to the report.
+
+If findings exist → add them to the Findings section using the standard Iron Law format.
+Label the risk as the test decay risk name (e.g., "Coverage Illusion", "Mock Abuse",
+"Test Obscurity").
+
+> **Note:** Step 7 is a fast check, not a full test audit. When systemic test problems
+> are found, note in the Summary: "Consider running `/brooks-lint:brooks-test` for a
+> complete test quality diagnosis."
+
+---
+
+## Output
+
+Use the standard Report Template from `../_shared/common.md`.
+Mode: PR Review
+Scope: list the files reviewed (excluding skipped generated files).
--- a/skills/thirdparty/brooks-sweep/SKILL.md
+++ b/skills/thirdparty/brooks-sweep/SKILL.md
@ -0,0 +1,38 @@
+---
+name: brooks-sweep
+description: >
+  Full-sweep mode: runs a unified analysis across all quality dimensions — code decay,
+  architecture, tech debt, and test quality — then applies fixes directly to the
+  codebase. Safe changes are auto-applied; risky changes are confirmed before
+  execution. Drawing on twelve classic engineering books.
+  Triggers when: user wants to "fix everything", "sweep the codebase", "auto-fix all
+  issues", "run all checks and fix them", "clean up the whole project", or asks for
+  a single command that both diagnoses and remediates quality problems.
+  Do NOT trigger for: read-only audits or health reports where the user only wants
+  findings without code changes; single-dimension reviews (use the focused skill
+  instead: brooks-review / brooks-audit / brooks-debt / brooks-test); server health
+  checks, HTTP /health endpoints, Kubernetes probes, or application uptime.
+---
+
+# Brooks-Lint — Full Sweep & Auto-Fix
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/decay-risks.md` for production risk symptom definitions
+4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
+5. Read `sweep-guide.md` in this directory for the unified scan and fix process
+
+## Process
+
+**If the user has not specified a project or directory:** apply Auto Scope Detection
+from `../_shared/common.md` to determine the review scope before proceeding.
+
+1. Show pre-flight consent notice and wait for the user's one-time approval (Step 0 of the guide)
+2. Enumerate scope and initialize the `unresolvable` / `non_critical_rounds` / `fix_log` state (Step 1 of the guide)
+3. Run the four dimensions in sequence — review, test, debt, audit — each scanning, classifying, applying Safe + Extended-Safe fixes, and verifying via the project test command (Steps 2–5 of the guide)
+4. Iterate: re-scan modified files + same-module + static consumers; converge on a clean round, retire 3-retry failures to the `unresolvable` set, cap non-critical rounds at 3 (Step 6 of the guide)
+5. Aggregate residual and unresolvable items and output the Full Sweep Report (Steps 7–8 of the guide)
+
+**Mode line in report:** `Full Sweep`
--- a/skills/thirdparty/brooks-sweep/sweep-guide.md
+++ b/skills/thirdparty/brooks-sweep/sweep-guide.md
@ -0,0 +1,264 @@
+# Brooks-Lint — Full Sweep Guide
+
+Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings
+in place, iterates until clean or capped, reports residuals. One interaction point:
+Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8.
+
+Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**.
+
+---
+
+### Step 0 — Pre-flight consent gate
+
+**Goal:** State scope, cost, and irreversibility up front; get explicit consent
+once so later steps never have to ask.
+
+0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or
+   `find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough.
+
+0b. Show this notice verbatim with the estimate filled in. Do not paraphrase —
+   the user is agreeing to this exact scope.
+
+   ```
+   ⚠️  /brooks-sweep — Full Repository Sweep & Auto-Fix
+
+   Scope:    Four analysis dimensions run in sequence — PR code decay (R1–R6),
+             test quality (T1–T6), tech debt, architecture. Edits are made in
+             place inside the detected project scope.
+   Estimated files in scope: ~N
+
+   Order:    brooks-review → brooks-test → brooks-debt → brooks-audit.
+             Each dimension scans, queues, and fixes before the next starts.
+
+   Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file
+             fixes that have test coverage AND do not break a public interface
+             also apply directly. High-risk fixes (public API break, cross-module
+             structural change, or no test coverage) are NOT applied — they are
+             recorded in the residual report for human review.
+
+   Iteration: After each dimension pass, modified files + same-module + static
+             consumers are re-scanned. A finding that fails to fix 3 times is
+             retired to the unresolvable set and never re-queued. Non-critical
+             rounds cap at 3 iterations; critical findings iterate until
+             resolved or retired.
+
+   Git impact: The pipeline edits files. It does NOT commit, push, or amend.
+             If you have uncommitted work you want to preserve, commit or stash
+             first.
+
+   Proceed with full autonomous sweep? [Y/n]
+   ```
+
+0c. Parse the reply (first match wins, evaluate rules in order):
+   1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified."
+   2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1.
+   3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1.
+   4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification."
+
+0d. After consent, do not ask further questions until Step 8.
+
+---
+
+### Step 1 — Scope enumeration and state init
+
+1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not
+   specify files or a directory. Otherwise honor the user's explicit scope.
+
+1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`,
+   `severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the
+   applied config values and reuse them across all iteration rounds — do not
+   re-read the file in Step 6 even if files were modified.
+
+1c. Initialize pipeline state (persists across all rounds):
+
+   - **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued.
+   - **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round.
+   - **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`).
+
+1d. Record the final scope file list in the Fix Report output buffer for Step 8.
+
+---
+
+### Step 2 — brooks-review pass (R1–R6 code decay)
+
+Scan every file in scope against all R-series risks defined in
+`../_shared/decay-risks.md`.
+
+2a. For each R-risk, apply its symptom checklist. Record each hit as a finding
+   with: risk code, file + approximate line range, Symptom, Source,
+   Consequence, Remedy, Severity (Critical / Warning / Suggestion), and
+   **Fix-Class** (see Step 2b).
+
+2b. Assign Fix-Class per finding:
+
+   | Class | Criteria |
+   |-------|----------|
+   | **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. |
+   | **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. |
+   | **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. |
+
+2c. Skip any finding that matches an entry in the `unresolvable` set.
+
+2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk
+   within each severity tier first. For each fix: Edit or Write, then append
+   one row to `fix_log` with outcome `applied`. If two fixes touch overlapping
+   line ranges in the same file, apply higher-severity first, re-read the file,
+   then apply the next.
+
+2e. After all fixes in this dimension, run the project test/lint command if one
+   exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.).
+   If tests fail: revert fixes from this dimension in reverse order one at a
+   time, re-running the test command after each revert, until tests pass.
+   Mark each reverted fix with outcome `reverted` in `fix_log` and promote the
+   finding to **Residual**. If no test command is found, note this once in the
+   report and continue.
+
+2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied,
+   R reverted, P Residual.
+
+---
+
+### Step 3 — brooks-test pass (T1–T6 test decay)
+
+Scan test files (and untested production code) against T-series risks defined
+in `../_shared/test-decay-risks.md`.
+
+Follow the same sub-steps as Step 2 (classify → apply → verify → summarize),
+using T-prefix risk codes. For production files with no test coverage at all,
+record as T2 (Missing Tests). A test scaffold that adds a pure-function test is
+**Safe**; adding tests that require new test infrastructure is **Residual**.
+
+---
+
+### Step 4 — brooks-debt pass (tech debt accumulation)
+
+Re-classify R-findings through a debt lens — same symptoms at accumulation scale:
+repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead
+flags. Score each with **Pain (1–3) × Spread (1–3)**; total 7–9 = Critical,
+4–6 = Warning, 1–3 = Suggestion. Apply a severity bump for pattern-level
+occurrences (isolated Suggestion → 4+ modules Warning).
+
+Follow the same sub-steps as Step 2. Debt findings often span multiple files
+and are more likely to land in Extended-Safe or Residual than Safe.
+
+---
+
+### Step 5 — brooks-audit pass (architecture integrity)
+
+Scan the full scope for architecture-level issues. The dependency-direction
+symptoms (inverted dependencies, circular imports, cross-domain coupling) are
+defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5
+additionally covers architecture-only concerns that R5 does not: missing
+abstraction layers, god modules, leaked infrastructure inside domain code,
+and seam-boundary violations.
+
+Most architecture findings are **Residual** by definition — they require human
+judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared
+constant used in 3+ modules into a new module that nothing else imports yet).
+Do not auto-refactor module layouts, rename packages, or change public exports.
+
+Follow the same sub-steps as Step 2.
+
+---
+
+### Step 6 — Iteration loop
+
+**Goal:** Re-scan what the fixes touched and converge. Stop on clean round,
+cap, or no progress.
+
+6a. Build the re-scan scope:
+   - every file modified in Steps 2–5 of the current round, PLUS
+   - every file in the same module as a modified file, PLUS
+   - every file that statically imports from a modified file.
+
+   Do not re-scan files whose dependencies were not touched. On monorepos
+   where a "module" may span hundreds of files, narrow the same-module bucket
+   to files that import from or are imported by a modified file (direct
+   dependency graph only).
+
+6b. Re-run Steps 2–5 on the re-scan scope. For each new finding in this round:
+   - If it matches an entry in `unresolvable` → skip.
+   - Else if 🔴 Critical → queue and fix; Critical findings iterate until
+     resolved OR retired (3 failed attempts → `unresolvable`).
+   - Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below.
+
+6c. Classify the round after all fixes attempted:
+   - **Clean round** (no new findings outside `unresolvable`): pipeline
+     converged → proceed to Step 7.
+   - **Critical-only round**: do NOT increment `non_critical_rounds`; return
+     to 6a.
+   - **Mixed or non-critical round** (any Warning / Suggestion produced):
+     increment `non_critical_rounds` by 1. If it reaches the cap (default 3,
+     or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7
+     with remaining non-critical findings recorded as
+     `"Unresolved — iteration cap reached"`. Otherwise return to 6a.
+
+6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times
+   across any combination of rounds, retire it to `unresolvable` with reason
+   `"3-retry budget exhausted"` and stop attempting it.
+
+---
+
+### Step 7 — Residual aggregation
+
+Collect everything that was NOT fixed in place, de-duplicated:
+
+- All Residual-class findings from Steps 2–5 (first round + re-scan rounds)
+- All `unresolvable` entries with their retirement reason
+- All iteration-cap residuals from Step 6c
+
+Sort Critical → Warning → Suggestion. Within each severity, list file path,
+risk code, Symptom (one line), Remedy (one line), and the reason it was not
+applied (`public API break` / `no test coverage` / `3-retry budget` /
+`iteration cap`).
+
+---
+
+### Step 8 — Sweep report
+
+Output the final report. Use the standard Report Template from
+`../_shared/common.md` with these additions:
+
+```
+# Brooks-Lint — Full Sweep Report
+Mode: Full Sweep | Scope: <files or directory>
+Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)   # omit if no config
+
+## Dimension Summary
+| Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual |
+|-----------|---------|--------------|------------------|----------|----------|
+| Review (R1–R6) | ... | ... | ... | ... | ... |
+| Test (T1–T6)   | ... | ... | ... | ... | ... |
+| Debt           | ... | ... | ... | ... | ... |
+| Audit          | ... | ... | ... | ... | ... |
+
+## Iteration History
+Round 1: <classification — clean / critical-only / mixed>, <N> new findings
+Round 2: ...
+Stopped at: clean round | iteration cap | no outstanding criticals
+
+## Fix Log
+| # | File | Lines | Risk | Outcome  | Change |
+|---|------|-------|------|----------|--------|
+| 1 | ...  | ...   | R2   | applied  | Extract repeated constant |
+| 2 | ...  | ...   | T4   | reverted | Test regression; promoted to Residual |
+...
+
+## Health Score Delta
+Before: <estimated score>/100  →  After: <estimated score>/100
+(Re-run /brooks-health for an exact recalculation.)
+
+## Residual Items  (<K> not applied)
+<Iron Law entries, sorted Critical → Suggestion, with "Not applied because: ..." line>
+
+## Summary
+- Total findings detected: <N>
+- Fixed this sweep: <M>
+- Residual (needs human review): <K>
+- Unresolvable (3-retry exhausted): <U>
+```
+
+If there are zero residual items and zero unresolvable entries, end with:
+**"Sweep complete — codebase is clean."**
+
+**Mode line in report:** `Full Sweep`
--- a/skills/thirdparty/brooks-test/SKILL.md
+++ b/skills/thirdparty/brooks-test/SKILL.md
@ -0,0 +1,36 @@
+---
+name: brooks-test
+description: >
+  Test quality review drawing on twelve classic engineering books — with primary focus
+  on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and
+  Working Effectively with Legacy Code — that diagnoses structural problems in an
+  existing test suite: brittleness, mock abuse, coverage illusions, slow execution,
+  poor readability.
+  Triggers when: user asks about test quality, shares test files for review, or
+  expresses frustration: "tests keep breaking whenever I change anything", "our tests
+  take forever", "I can't understand what this test is doing", "tests pass but bugs
+  still reach production", "we have too many mocks".
+  Do NOT trigger for: writing new tests from scratch (use the regular test-writing
+  workflow) or testing framework/syntax questions — this skill reviews an existing
+  suite for structural quality problems, not individual test authoring.
+---
+
+# Brooks-Lint — Test Quality Review
+
+## Setup
+
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions
+4. Read `test-guide.md` in this directory for the test quality review framework
+
+## Process
+
+**If the user has not shared test files or pointed to a test directory:** apply Auto
+Scope Detection from `../_shared/common.md` to determine the review scope before proceeding.
+
+1. Build the test suite map (guide's "Before You Start" section)
+2. Scan for each test decay risk in the order specified (Steps 1–4 of the guide)
+3. Apply the Iron Law and output using the Report Template (Step 5 of the guide)
+
+**Mode line in report:** `Test Quality Review`
--- a/skills/thirdparty/brooks-test/test-guide.md
+++ b/skills/thirdparty/brooks-test/test-guide.md
@ -0,0 +1,147 @@
+# Test Quality Review Guide — Mode 4
+
+**Purpose:** Diagnose the health of a test suite using six test-space decay risks.
+Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
+
+---
+
+## Before You Start: Build the Test Suite Map
+
+Before scanning for any risk, map the current test suite structure:
+
+```
+Unit tests:        X files, ~N tests
+Integration tests: X files, ~N tests
+E2E tests:         X files, ~N tests
+Ratio:             Unit X%  :  Integration X%  :  E2E X%
+Coverage areas:    [modules with tests] vs [modules without tests]
+```
+
+If you cannot access test files directly, ask the user **one question** — choose the
+most relevant:
+1. "Which module is hardest to test or has the least coverage?"
+2. "When you make a change, how often do unrelated tests break?"
+3. "Is there a part of the codebase your team avoids touching because it has no tests?"
+
+After one answer, proceed. Do not ask more than one question.
+
+---
+
+## Analysis Process
+
+Work through these five steps in order.
+
+### Step 1: Scan for Test Obscurity
+
+*Scan this first — the most visible risk and the one that determines whether the suite
+is maintainable at all.*
+
+Look for:
+- Read 5–10 test names at random: can each one communicate subject + scenario + expected
+  outcome without opening the test body?
+- Are there tests where a failure gives no clue which behavior broke (multiple assertions,
+  no message strings)?
+- Does any test depend on external state (files, database rows, env variables, shared mutable
+  fixtures) that is invisible from within the test body?
+- Is there a single massive setUp or beforeEach that every test inherits regardless of
+  what it actually needs?
+
+If all test names are clear and setups are minimal → no finding.
+
+### Step 2a: Scan for Test Brittleness
+
+*Brittle tests break on refactors that do not change observable behavior — they test
+implementation, not contracts.*
+
+Look for:
+- Ask (or check git history): did any recent refactor cause test failures with no
+  behavior change?
+- Are there test methods where the name contains "and" or that assert on 3 or more
+  unrelated behaviors (Eager Test)?
+- Do assertions specify mock call order or exact parameter values that are irrelevant
+  to the observable behavior?
+- Are tests coupled to private methods or internal state directly?
+
+If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
+If isolated (1–2 brittle tests) → 🟢 Suggestion.
+
+### Step 2b: Scan for Mock Abuse
+
+*Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
+Scan this separately from brittleness — over-mocking is often the cause of brittleness,
+but it is a distinct problem worth its own finding.*
+
+**Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and
+check brittleness signals and mock-setup ratio in the same pass, then write separate
+findings if both problems are present.
+
+Look for:
+- Is mock setup code longer than the assertion logic in the sampled tests?
+- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
+  assertions on outputs, state, or observable events?
+- Are there methods in production classes that are only called from test files
+  (test-induced design damage)?
+- Does any single test create more than 3 mock objects?
+
+If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
+If production methods exist only for test access → 🔴 Critical (architecture is being
+distorted by the test suite).
+
+### Step 3: Scan for Test Duplication
+
+Look for:
+- Is the same setup block (same variables initialized the same way) repeated across
+  5 or more test files without a shared helper?
+- Are there multiple tests that pass identical inputs and assert identical outputs
+  with no differentiation (Lazy Test)?
+- Is the same business scenario covered at unit, integration, and E2E level with no
+  difference in what each layer is testing?
+
+If duplication is systemic (10 or more instances) → Critical.
+If localized (3–5 instances) → Warning.
+
+### Step 4: Scan for Coverage Illusion and Architecture Mismatch
+
+Look for Coverage Illusion:
+- Pick the most recently modified core module. Are its error-handling branches and
+  null/boundary inputs covered by tests?
+- Are there legacy areas (old functions, no test files nearby) that are actively
+  being changed?
+- Do the tests assert on side effects (DB writes, events emitted, state transitions)
+  or only on return values?
+
+**Characterization Test check:** If legacy code is being modified without existing tests,
+the team needs Characterization Tests before making the change — not after. Look for
+this pattern and flag it when absent.
+
+A Characterization Test locks in current behavior (right or wrong) so future changes
+do not silently regress it. Template:
+```
+test("characterize: [module].[method] given [input], returns [current output]") {
+  // Call the code under test with realistic inputs
+  // Assert on whatever it currently returns — even if you suspect the output is wrong
+  // Add a comment: "This captures current behavior, not necessarily correct behavior"
+}
+```
+Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests
+
+Look for Architecture Mismatch:
+- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
+- Are high-risk modules tested at higher density than trivial utilities?
+
+**Test suite performance:** A slow test suite is a first-class maintainability risk — it
+breaks the fast-feedback loop and causes developers to skip running tests locally.
+- If the full suite runtime is known and > 10 minutes → 🟡 Warning
+- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
+  means nobody is running it regularly)
+- If tests that could be unit tests are integration tests, that is a Performance Mismatch:
+  each misclassified test adds seconds of avoidable wait time
+
+Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)
+
+### Step 5: Apply Iron Law, Output Report
+
+Apply the Iron Law format from `../_shared/common.md` to each finding.
+
+Use the standard Report Template. Mode: Test Quality Review.
+Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".
--- a/skills/thirdparty/codebase-migrate/SKILL.md
+++ b/skills/thirdparty/codebase-migrate/SKILL.md
@ -0,0 +1,128 @@
+---
+name: codebase-migrate
+description: Run large codebase migrations and multi-file refactors. Uses the Composio CLI to coordinate issue tracking, batched PRs, and CI verification while the agent executes the transforms locally across hundreds of files.
+metadata:
+  short-description: Codebase migrations + multi-file refactors
+---
+
+# Codebase Migrate
+
+Coordinate framework upgrades, API renames, config rewrites, and structural refactors across hundreds of files. Local edits are driven by the agent; the [Composio CLI](https://docs.composio.dev/docs/cli) handles the surrounding ceremony: tracking issues, per-batch PRs, and CI verification.
+
+## When to Use
+
+- Framework upgrade (React 17 → 19, Node 18 → 22, Django 4 → 5).
+- API rename across a monorepo (e.g., `getUserById` → `users.byId`).
+- Config/format migration (webpack → vite, eslint → biome, jest → vitest).
+- Any "change 200 files the same way" task that needs to ship in reviewable slices.
+
+## Prereqs
+
+```bash
+curl -fsSL https://composio.dev/install | bash
+composio login
+composio link github        # for PRs + CI status
+composio link linear        # or jira — for migration tracking
+```
+
+Local tools the agent will use directly: `git`, `rg`, `jscodeshift`/`ts-morph`/`comby`/`ast-grep` (language-appropriate), and your test runner.
+
+## Planning Phase
+
+1. **Define the transform precisely.** Bad: "migrate to vitest." Good: "replace `jest.mock` with `vi.mock`, swap `jest.fn()` for `vi.fn()`, rename `jest.config.js` → `vitest.config.ts` using template X."
+2. **Scope the blast radius:**
+   ```bash
+   rg -l 'jest\.(mock|fn|spyOn)' | wc -l
+   rg -l 'from "jest"' | sort
+   ```
+3. **File a tracking issue:**
+   ```bash
+   composio execute LINEAR_CREATE_ISSUE -d '{
+     "teamId":"TEAM_ID",
+     "title":"Migrate test runner: jest → vitest",
+     "description":"Batches of ~25 files. Checkpoint after each PR lands green."
+   }'
+   ```
+
+## Execute in Reviewable Batches
+
+Loop: pick N files → transform → test → PR → wait for green → merge → next batch.
+
+```bash
+# Batch helper: first 25 untouched files matching the pattern
+BATCH=$(rg -l 'jest\.mock' | grep -v done.list | head -25)
+echo "$BATCH" > batch.list
+```
+
+The agent runs the codemod on `batch.list`, then:
+
+```bash
+git checkout -b migrate/vitest-batch-03
+xargs < batch.list codemod-runner   # e.g. jscodeshift / ts-morph / comby
+npm test -- --changed
+git add -A && git commit -m "migrate(test): jest → vitest (batch 3)"
+git push -u origin migrate/vitest-batch-03
+
+composio execute GITHUB_CREATE_A_PULL_REQUEST -d '{
+  "owner":"acme","repo":"app",
+  "head":"migrate/vitest-batch-03","base":"main",
+  "title":"migrate(test): jest → vitest (batch 3)",
+  "body":"Part of LIN-482. 25 files. Codemod: `transforms/jest-to-vitest.ts`."
+}'
+```
+
+Then poll CI and merge when green:
+
+```bash
+composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
+  -d '{"owner":"acme","repo":"app","branch":"migrate/vitest-batch-03"}'
+```
+
+## Workflow Script
+
+`scripts/migrate-batch.ts`, run per batch via `composio run --file scripts/migrate-batch.ts -- --batch 3`:
+
+```ts
+const batch = process.argv[process.argv.indexOf("--batch") + 1];
+
+const pr = await execute("GITHUB_CREATE_A_PULL_REQUEST", {
+  owner: "acme", repo: "app",
+  head: `migrate/vitest-batch-${batch}`, base: "main",
+  title: `migrate(test): jest → vitest (batch ${batch})`,
+  body: `Part of LIN-482. See transforms/jest-to-vitest.ts.`
+});
+
+await execute("LINEAR_CREATE_COMMENT", {
+  issueId: "LIN-482",
+  body: `Opened PR #${pr.number}: ${pr.html_url}`
+});
+```
+
+## Safety Rails
+
+- **One transform per PR.** Never mix a rename with a format change.
+- **Keep a `done.list`** of files already migrated so the next batch skips them.
+- **Run the full test suite on the last batch**, even if per-batch PRs ran `--changed`.
+- **Codemod first, hand-edit second.** If the codemod misses 3 files, patch them manually and note it in the PR body.
+- **Roll back per-batch**, not globally. Each PR should revert cleanly.
+
+## Verification Loop
+
+After each merge:
+
+```bash
+rg 'jest\.(mock|fn|spyOn)' | wc -l     # should trend to 0
+npm test                                # full suite
+composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
+  -d '{"owner":"acme","repo":"app","branch":"main","event":"push"}' \
+  | jq '.workflow_runs[0].conclusion'
+```
+
+## Troubleshooting
+
+- **Codemod regex catches too much** → switch to AST-based tooling (`ast-grep`, `ts-morph`) for structural matches.
+- **Tests pass locally, CI fails** → pin Node/Python version parity; check `.nvmrc` / `pyproject.toml`.
+- **PR too big to review** → cut batch size in half; maintainers won't review 800-line diffs.
+- **Conflicts between batches** → rebase the open batch before merging the current one; never force-push merged batches.
+
+Full CLI reference: [docs.composio.dev/docs/cli](https://docs.composio.dev/docs/cli)
--- a/skills/thirdparty/codebase-recon/SKILL.md
+++ b/skills/thirdparty/codebase-recon/SKILL.md
@ -0,0 +1,266 @@
+---
+name: codebase-recon
+description: This skill should be used when analyzing codebases, understanding architecture, or when "analyze", "investigate", "explore code", or "understand architecture" are mentioned.
+metadata:
+  version: "1.0.0"
+---
+
+# Codebase Analysis
+
+Evidence-based investigation → findings → confidence-tracked conclusions.
+
+## Steps
+
+1. Gather evidence from multiple sources (code, docs, tests, history)
+2. Track confidence level as investigation progresses
+3. Based on findings:
+   - If pattern analysis needed → load the `outfitter:patterns` skill
+   - If root cause investigation → load the `outfitter:find-root-causes` skill
+   - If ready to report → load the `outfitter:report-findings` skill
+4. Deliver findings with confidence level and caveats
+
+<when_to_use>
+
+- Codebase exploration and understanding
+- Architecture analysis and mapping
+- Pattern extraction and recognition
+- Technical research within code
+- Performance or security analysis
+
+NOT for: wild guessing, assumptions without evidence, conclusions before investigation
+
+</when_to_use>
+
+<confidence>
+
+| Bar | Lvl | Name | Action |
+|-----|-----|------|--------|
+| `░░░░░` | 0 | Gathering | Collect initial evidence |
+| `▓░░░░` | 1 | Surveying | Broad scan, surface patterns |
+| `▓▓░░░` | 2 | Investigating | Deep dive, verify patterns |
+| `▓▓▓░░` | 3 | Analyzing | Cross-reference, fill gaps |
+| `▓▓▓▓░` | 4 | Synthesizing | Connect findings, high confidence |
+| `▓▓▓▓▓` | 5 | Concluded | Deliver findings |
+
+*Calibration: 0=0–19%, 1=20–39%, 2=40–59%, 3=60–74%, 4=75–89%, 5=90–100%*
+
+Start honest. Clear codebase + focused question → level 2–3. Vague or complex → level 0–1.
+
+At level 4: "High confidence in findings. One more angle would reach full certainty. Continue or deliver now?"
+
+Below level 5: include `△ Caveats` section.
+
+</confidence>
+
+<principles>
+
+## Core Methodology
+
+**Evidence over assumption** — investigate when you can, guess only when you must.
+
+**Multi-source gathering** — code, docs, tests, history, web research, runtime behavior.
+
+**Multiple angles** — examine from different perspectives before concluding.
+
+**Document gaps** — flag uncertainty with △, track what's unknown.
+
+**Show your work** — findings include supporting evidence, not just conclusions.
+
+**Calibrate confidence** — distinguish fact from inference from assumption.
+
+</principles>
+
+<evidence_gathering>
+
+## Source Priority
+
+1. **Direct observation** — read code, run searches, examine files
+2. **Documentation** — official docs, inline comments, ADRs
+3. **Tests** — reveal intended behavior and edge cases
+4. **History** — git log, commit messages, PR discussions
+5. **External research** — library docs, Stack Overflow, RFCs
+6. **Inference** — logical deduction from available evidence
+7. **Assumption** — clearly flagged when other sources unavailable
+
+## Investigation Patterns
+
+**Start broad, then narrow:**
+- File tree → identify relevant areas
+- Search patterns → locate specific code
+- Code structure → understand without full content
+- Read targeted files → examine implementation
+- Cross-reference → verify understanding
+
+**Layer evidence:**
+- What does the code do? (direct observation)
+- Why was it written this way? (history, comments)
+- How does it fit the system? (architecture, dependencies)
+- What are the edge cases? (tests, error handling)
+
+**Follow the trail:**
+- Function calls → trace execution paths
+- Imports/exports → map dependencies
+- Test files → understand usage patterns
+- Error messages → reveal assumptions
+- Comments → capture historical context
+
+</evidence_gathering>
+
+<output_format>
+
+## During Investigation
+
+After each evidence-gathering step emit:
+
+- **Confidence:** {BAR} {NAME}
+- **Found:** { key discoveries }
+- **Patterns:** { emerging themes }
+- **Gaps:** { what's still unclear }
+- **Next:** { investigation direction }
+
+## At Delivery (Level 5)
+
+### Findings
+
+{ numbered list of discoveries with supporting evidence }
+
+1. {FINDING} — evidence: {SOURCE}
+2. {FINDING} — evidence: {SOURCE}
+
+### Patterns
+
+{ recurring themes or structures identified }
+
+### Implications
+
+{ what findings mean for the question at hand }
+
+### Confidence Assessment
+
+Overall: {BAR} {PERCENTAGE}%
+
+High confidence areas:
+- {AREA} — {REASON}
+
+Lower confidence areas:
+- {AREA} — {REASON}
+
+### Supporting Evidence
+
+- Code: { file paths and line ranges }
+- Docs: { references }
+- Tests: { relevant test files }
+- History: { commit SHAs if relevant }
+- External: { URLs if applicable }
+
+## Below Level 5
+
+### △ Caveats
+
+**Assumptions:**
+- {ASSUMPTION} — { why necessary, impact if wrong }
+
+**Gaps:**
+- {GAP} — { what's missing, how to fill }
+
+**Unknowns:**
+- {UNKNOWN} — { noted for future investigation }
+
+</output_format>
+
+<specialized_techniques>
+
+Load skills for specialized analysis (see Steps section):
+
+- **Pattern analysis** → `outfitter:patterns`
+- **Root cause investigation** → `outfitter:find-root-causes`
+- **Research synthesis** → `outfitter:report-findings`
+- **Architecture analysis** → see [architecture-analysis.md](references/architecture-analysis.md)
+
+</specialized_techniques>
+
+<workflow>
+
+Loop: Gather → Analyze → Update Confidence → Next step
+
+1. **Calibrate starting confidence** — what do we already know?
+2. **Identify evidence sources** — where can we look?
+3. **Gather systematically** — collect from multiple angles
+4. **Cross-reference findings** — verify patterns hold
+5. **Flag uncertainties** — mark gaps with △
+6. **Synthesize conclusions** — connect evidence to insights
+7. **Deliver with confidence level** — clear about certainty
+
+At each step:
+- Document what you found (evidence)
+- Note what it means (interpretation)
+- Track what's still unclear (gaps)
+- Update confidence bar
+
+</workflow>
+
+<validation>
+
+Before concluding (level 4+):
+
+**Check evidence quality:**
+- ✓ Multiple sources confirm pattern?
+- ✓ Direct observation vs inference clearly marked?
+- ✓ Assumptions explicitly flagged?
+- ✓ Counter-examples considered?
+
+**Check completeness:**
+- ✓ Original question fully addressed?
+- ✓ Edge cases explored?
+- ✓ Alternative explanations ruled out?
+- ✓ Known unknowns documented?
+
+**Check deliverable:**
+- ✓ Findings supported by evidence?
+- ✓ Confidence calibrated honestly?
+- ✓ Caveats section included if <100%?
+- ✓ Next steps clear if incomplete?
+
+</validation>
+
+<rules>
+
+ALWAYS:
+- Investigate before concluding
+- Cite evidence sources with file paths/URLs
+- Use confidence bars to track certainty
+- Flag assumptions and gaps with △
+- Cross-reference from multiple angles
+- Document investigation trail
+- Distinguish fact from inference
+- Include caveats below level 5
+
+NEVER:
+- Guess when you can investigate
+- State assumptions as facts
+- Conclude from single source
+- Hide uncertainty or gaps
+- Skip validation checks
+- Deliver without confidence assessment
+- Conflate evidence with interpretation
+
+</rules>
+
+<references>
+
+Core methodology:
+- [confidence.md](../pathfinding/references/confidence.md) — confidence calibration (shared with pathfinding)
+
+Micro-skills (load as needed):
+- `outfitter:patterns` — extracting and validating patterns
+- `outfitter:find-root-causes` — systematic problem diagnosis
+- `outfitter:report-findings` — multi-source research synthesis
+
+Local references:
+- [architecture-analysis.md](references/architecture-analysis.md) — system structure mapping
+
+Related skills:
+- `outfitter:pathfinding` — clarifying requirements before analysis
+- `outfitter:debugging` — structured bug investigation
+
+</references>
--- a/skills/thirdparty/codebase-recon/references/architecture-analysis.md
+++ b/skills/thirdparty/codebase-recon/references/architecture-analysis.md
@ -0,0 +1,220 @@
+# Architecture Analysis
+
+Techniques for analyzing system structure, dependencies, and component relationships.
+
+## Dependency Mapping
+
+### Forward Dependencies
+
+What a component relies on:
+1. **Direct imports** — explicit dependencies in code
+2. **Indirect references** — called through interfaces
+3. **Runtime dependencies** — configuration, environment
+4. **Data dependencies** — shared state, databases
+
+### Reverse Dependencies
+
+What relies on this component:
+1. **Direct dependents** — explicit imports from other modules
+2. **Interface consumers** — components using this API
+3. **Side effect consumers** — code relying on mutations
+4. **Event subscribers** — listeners for this component's events
+
+### Circular Dependencies
+
+Red flags:
+- A imports B, B imports A
+- Longer cycles: A → B → C → A
+- Implicit cycles through shared state
+
+Resolution strategies:
+- Extract shared code to separate module
+- Introduce interface/abstraction layer
+- Invert dependency direction
+- Break into smaller components
+
+## Layer Identification
+
+### Detecting Layers
+
+Look for:
+- **Directional flow** — data/control flows one way
+- **Abstraction levels** — concrete → abstract as you ascend
+- **Responsibility clustering** — similar concerns grouped
+- **Interface boundaries** — clear contracts between groups
+
+### Common Layer Patterns
+
+**Three-tier**:
+- Presentation (UI, API endpoints)
+- Business logic (domain, workflows)
+- Data access (repositories, queries)
+
+**Hexagonal/Clean**:
+- Core domain (entities, business rules)
+- Application layer (use cases, orchestration)
+- Infrastructure (frameworks, external services)
+- Interfaces (controllers, adapters)
+
+**Microservices**:
+- Service boundary (API gateway)
+- Service logic (domain per service)
+- Data layer (per-service database)
+- Cross-cutting (auth, logging, monitoring)
+
+### Layer Violations
+
+Violations indicate architectural drift:
+- Lower layer imports higher layer
+- Business logic in presentation layer
+- Data access code in domain entities
+- Infrastructure concerns leaking into core
+
+## Interface Analysis
+
+### Contract Definition
+
+Examine:
+- **Input types** — what does it accept?
+- **Output types** — what does it return?
+- **Error modes** — what can fail, how?
+- **Side effects** — mutations, I/O, state changes
+- **Invariants** — what must always be true?
+
+### API Quality
+
+Strong interfaces show:
+- **Cohesion** — methods belong together
+- **Minimal surface** — small, focused API
+- **Clear contracts** — types tell the story
+- **Stability** — changes don't cascade
+- **Composability** — works well with others
+
+Weak interfaces show:
+- **Kitchen sink** — unrelated methods bundled
+- **Leaky abstractions** — implementation details exposed
+- **Unstable** — frequent breaking changes
+- **Rigid** — hard to extend or compose
+
+## Component Relationships
+
+### Relationship Types
+
+**Composition**:
+- Component owns sub-components
+- Lifecycles coupled
+- Strong cohesion
+- Example: `Page` owns `Header`, `Footer`
+
+**Aggregation**:
+- Component references others
+- Independent lifecycles
+- Loose coupling
+- Example: `ShoppingCart` references `Product`
+
+**Dependency**:
+- Uses another component's interface
+- No ownership
+- Can be swapped
+- Example: `AuthService` uses `Database`
+
+**Association**:
+- Knows about but doesn't own
+- Weak relationship
+- Often bidirectional
+- Example: `User` ↔ `Post` (many-to-many)
+
+### Coupling Analysis
+
+**Low coupling** (good):
+- Communicate through interfaces
+- Few shared assumptions
+- Changes localized
+- Easy to test in isolation
+
+**High coupling** (risky):
+- Direct field access
+- Shared mutable state
+- Knowledge of implementation
+- Changes ripple widely
+
+## Architectural Pattern Recognition
+
+### Layered Architecture
+
+Indicators:
+- Unidirectional dependencies (top → bottom)
+- Each layer uses only layer below
+- Clear separation of concerns
+
+Trade-offs:
+- ✓ Simple, well-understood
+- ✓ Easy to enforce rules
+- ✗ Can become rigid
+- ✗ Performance overhead
+
+### Event-Driven Architecture
+
+Indicators:
+- Pub/sub or message queues
+- Decoupled components
+- Asynchronous communication
+- Event sourcing patterns
+
+Trade-offs:
+- ✓ Scalable, resilient
+- ✓ Loose coupling
+- ✗ Harder to reason about flow
+- ✗ Eventual consistency challenges
+
+### Microservices
+
+Indicators:
+- Service per bounded context
+- Independent deployment
+- API-based communication
+- Decentralized data
+
+Trade-offs:
+- ✓ Independent scaling
+- ✓ Technology diversity
+- ✗ Distributed system complexity
+- ✗ Operational overhead
+
+## Analysis Workflow
+
+### Top-Down
+
+Start broad, narrow focus:
+1. **System boundaries** — what's in scope?
+2. **Major components** — high-level modules
+3. **Component interactions** — how they communicate
+4. **Internal structure** — zoom into each component
+5. **Implementation** — code-level details
+
+### Bottom-Up
+
+Start specific, build understanding:
+1. **Entry point** — main(), server start, UI root
+2. **Call graph** — trace execution paths
+3. **Cluster calls** — group related functionality
+4. **Extract components** — identify logical boundaries
+5. **Map relationships** — connect the pieces
+
+### Targeted
+
+Focus on specific concern:
+1. **Define question** — what are you trying to understand?
+2. **Identify relevant code** — where does this happen?
+3. **Trace dependencies** — what does it touch?
+4. **Analyze impact** — what would changing this affect?
+5. **Document findings** — capture insights
+
+## Documentation Extraction
+
+From architecture analysis, capture:
+- **Component diagram** — boxes and arrows
+- **Dependency graph** — what imports what
+- **Layer diagram** — abstraction levels
+- **Sequence diagrams** — interaction flows
+- **Decision records** — why this structure?