📦 deps(skills): sync thirdparty skills

This commit is contained in:
ci[bot] 2026-05-29 08:33:55 +00:00
parent b7eb57f5da
commit b4c88b32be
25 changed files with 3033 additions and 0 deletions

View File

@ -0,0 +1,7 @@
_shared
brooks-audit
brooks-debt
brooks-health
brooks-review
brooks-sweep
brooks-test

View File

@ -0,0 +1 @@
codebase-migrate

View File

@ -0,0 +1 @@
codebase-recon

240
skills/thirdparty/_shared/common.md vendored Normal file
View File

@ -0,0 +1,240 @@
# Brooks-Lint — Shared Framework
Code and test quality diagnosis using principles from twelve classic software engineering books.
Use `source-coverage.md` to keep those sources grounded in real evidence, exceptions, and tradeoffs.
## The Iron Law
```
NEVER suggest fixes before completing risk diagnosis.
EVERY finding must follow: Symptom → Source → Consequence → Remedy.
```
Violating this law produces reviews that list rule violations without explaining why they
matter. A finding without a consequence and a remedy is not a finding — it is noise.
> **On-demand sections (skip unless the condition applies):**
> - "Remedy Mode" — only when user passes `--fix` or asks to fix findings
> - "Post-Report Triage" — only in interactive sessions after the report is output
> - "History Tracking" — only after the Health Score is computed
## Project Config
Before executing the review, attempt to read `.brooks-lint.yaml` from the project root.
If the file exists, parse and apply its settings before proceeding.
If the file does not exist, continue with defaults (all risks enabled, no ignores).
In a multi-mode session, re-read only if the user says the config has changed.
### Supported settings
**`disable`** — list of risk codes to skip entirely. Findings for disabled risks are
silently omitted from the report and do not affect the Health Score.
Valid codes: `R1` `R2` `R3` `R4` `R5` `R6` `T1` `T2` `T3` `T4` `T5` `T6`
**`severity`** — override the severity of a specific risk for this project.
Valid values: `critical` `warning` `suggestion`
Example: `R1: suggestion` means every R1 finding is downgraded to Suggestion regardless
of what the guide says.
**`ignore`** — list of glob patterns. Files matching any pattern are excluded from
analysis. Findings that arise solely from ignored files are omitted.
Common entries: `**/*.generated.*`, `**/vendor/**`, `**/migrations/**`
**`focus`** — non-empty list of risk codes to evaluate; all others are skipped.
Omit this key (or leave it empty) to evaluate all non-disabled risks.
Cannot be combined with a non-empty `disable` list.
**Minimal example:**
```yaml
version: 1
disable:
- T5
severity:
R1: suggestion
ignore:
- "**/*.generated.*"
```
If `.brooks-lint.yaml` contains a `custom_risks` map, read `custom-risks-guide.md`
from the `_shared/` directory for loading and scanning instructions.
### Config Validation
Before applying, check for errors and mention each in the report:
- Invalid risk code (not R1R6, T1T6, or a defined `Cx` code): skip it, note `"Config warning: X is not a valid risk code"`
- Invalid severity value (not `critical`/`warning`/`suggestion`): skip it, note the error
- Both `disable` and `focus` are non-empty: treat as a config error, ignore both, note it
If the YAML fails to parse entirely, skip config loading and proceed with defaults.
### Config Reporting
If a config file was found and applied, add this line immediately after the **Scope** line
in the report:
`Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)`
Include N and M even if zero. Omit this line if no config file was found.
---
## Auto Scope Detection
When no files or code are specified, detect scope automatically:
**PR Review:** `git diff --cached``git diff``git diff main...HEAD` → ask user.
**Architecture Audit / Tech Debt:** Entire project by default. `--since=<ref>`: run `git diff <ref>...HEAD --name-only`, analyze only modules containing changed files; note "Incremental audit — modules touched since <ref>".
**Test Quality:** All test files by default. If a diff exists, prioritize test files co-located with changed production files (`src/foo.ts` → `src/foo.test.ts`).
**Health Dashboard:** Entire project by default. If user provides a path, scope all dimension sub-scans to that path.
**Scope line:** Always state what was detected — e.g., `Scope: staged changes (3 files)` or `Scope: branch changes vs main (12 files)`.
---
## The Six Decay Risks
Navigation index only — canonical definitions (symptoms, severity guides, sources, "What Not
to Flag" guards) live in `decay-risks.md`. Do not duplicate or edit diagnostic questions here;
update `decay-risks.md` directly. Book-level coverage, exceptions, and tradeoffs are in
`source-coverage.md`.
| Risk | Diagnostic Question |
|------|---------------------|
| Cognitive Overload | How much mental effort to understand this? |
| Change Propagation | How many unrelated things break on one change? |
| Knowledge Duplication | Is the same decision expressed in multiple places? |
| Accidental Complexity | Is the code more complex than the problem? |
| Dependency Disorder | Do dependencies flow in a consistent direction? |
| Domain Model Distortion | Does the code faithfully represent the domain? |
---
## Report Template
**Language rule:** Output the report in the same language the user is using. Translate the
per-finding content and the one-sentence verdict to match the user's language. Keep the
following in English: Iron Law field labels (Symptom / Source / Consequence / Remedy),
book titles, principle and smell names (e.g. "Shotgun Surgery", "Divergent Change"),
and fixed structural headers from the template below (`Findings`, `Summary`,
`Module Dependency Graph`, `Critical`, `Warning`, `Suggestion`).
````
# Brooks-Lint Review
**Mode:** [PR Review / Architecture Audit / Tech Debt Assessment / Test Quality Review]
**Scope:** [file(s), directory, or description of what was reviewed]
**Health Score:** XX/100
[One sentence overall verdict]
---
## Module Dependency Graph
<!-- Mode 2 (Architecture Audit) ONLY — omit this section for other modes -->
<!-- classDef colors: see architecture-guide.md Step 1 Rule 6 -->
```mermaid
graph TD
...
```
---
## Findings
<!-- Sort all findings by severity: Critical first, then Warning, then Suggestion -->
<!-- If no findings in a severity tier, omit that tier's heading -->
### 🔴 Critical
**[Risk Name] — [Short descriptive title]**
Symptom: [exactly what was observed in the code]
Source: [Book title — Principle or Smell name]
Consequence: [what breaks or gets worse if this is not fixed]
Remedy: [concrete, specific action]
### 🟡 Warning
**[Risk Name] — [Short descriptive title]**
Symptom: ...
Source: ...
Consequence: ...
Remedy: ...
### 🟢 Suggestion
**[Risk Name] — [Short descriptive title]**
Symptom: ...
Source: ...
Consequence: ...
Remedy: ...
---
## Summary
[23 sentences: what is the most important action, and what is the overall trend]
````
## Remedy Mode
When the user passes `--fix` or asks to "fix the findings", read
`remedy-guide.md` from the `_shared/` directory before writing the report.
## Health Score Calculation
Base score: 100
Deductions:
- Each 🔴 Critical finding: 15
- Each 🟡 Warning finding: 5
- Each 🟢 Suggestion finding: 1
Floor: 0 (score cannot go below 0)
## History Tracking
After generating the Health Score, attempt to append a record to `.brooks-lint-history.json`
in the project root.
**Append logic:**
1. Read the file (or start with empty array if it doesn't exist)
2. Append: `{ date, mode, score, findings: { critical, warning, suggestion }, scope }`
3. Write the file back
**Trend display:** If the history file exists and contains at least one prior record for
the same mode, add a Trend line after the Health Score in the report:
**Trend:** 85 → 82 (3) over last 3 runs
Show the most recent prior score and the delta. If delta is 0: "Stable at 82".
If this is the first run for this mode: "First run — no trend data".
## Post-Report Triage (Optional)
**Guard:** Interactive sessions only — skip in CI/headless mode.
After reporting Warning or Suggestion findings, offer:
> Would you like to triage these findings? (accept / dismiss / defer / skip)
For each finding one at a time (lowest severity first): show title, ask `[a]ccept / [d]ismiss / [f]defer / [s]kip`; wait for reply before moving to the next.
**Dismiss:** ask one-line reason → append to `.brooks-lint.yaml` under `suppress:` → downgraded to info in future runs.
**Defer:** same as dismiss, add `expires: YYYY-MM-DD` (default 90 days) → resurfaces at original severity after expiry.
**Suppress matching at scan time:** for each `suppress:` entry, match `risk` code and file `pattern` against findings.
- Both match → downgrade to info (not counted in Health Score, shown under collapsed "Suppressed" section).
- `expires` is past → ignore entry, finding resurfaces. Note in Summary: "N suppressed findings have expired and are now active again."
## Reference Files
Read on demand:
| File | When to Read |
|------|-------------|
| `source-coverage.md` | At the start of every review, before writing findings |
| `decay-risks.md` | Before any production-code review or architecture/debt assessment |
| `test-decay-risks.md` | Before any test review and before the PR Review "Quick Test Check" step |

View File

@ -0,0 +1,48 @@
# Custom Risk Loading Guide
When `.brooks-lint.yaml` contains a `custom_risks` map, this guide governs how those
risks are loaded and scanned. Custom risks use `Cx` codes (C1, C2, …) — no conflict with
the standard R1R6 and T1T6 namespaces.
---
## Loading
1. For each entry in `custom_risks`, validate that it has:
- `name` — non-empty string
- `question` — the diagnostic question to ask
- `symptoms` — non-empty list of symptom patterns
- `severity` — map with at least one of: `critical`, `warning`, `suggestion`
2. Register each valid entry as a `Cx` code alongside R1R6 / T1T6. Once loaded,
`Cx` codes become valid targets for `disable`, `focus`, and `severity` fields in
the same config file.
3. Report any validation errors as config warnings (do not abort the review):
- Missing required field: `"Config warning: C1 missing 'symptoms'"`
- Invalid code format (must be `C` followed by digits): skip, note error
- Code conflicts with R/T namespace: skip, note error
---
## Scanning
During the analysis, treat each custom risk as an additional step after the standard
process:
- Use `question` as the diagnostic question
- Use `symptoms` as the symptom lookup list
- Use the `severity` map for tier classification
- Apply the Iron Law: `Source` field should be `"[Project-defined risk] — <risk name>"`
- Include custom risk findings in the Health Score (same deduction rules as R/T codes)
- In the report, custom findings appear after standard findings under a
**### Project-Specific Risks** sub-heading
---
## Config Validation additions
The following codes are valid in `disable`, `focus`, and `severity`:
- Standard: `R1``R6`, `T1``T6`
- Custom: any `Cx` code defined in `custom_risks`
- Any other code: skip it and emit `"Config warning: X is not a valid risk code"`

294
skills/thirdparty/_shared/decay-risks.md vendored Normal file
View File

@ -0,0 +1,294 @@
# Decay Risk Reference
Six patterns that cause software to degrade. Apply the Iron Law to each finding.
---
## Risk 1: Cognitive Overload
**Diagnostic question:** How much mental effort does a human need to understand this?
Cognitive load beyond working memory causes mistakes, avoidance, and blocks the refactoring that would fix it.
### Symptoms
- Function longer than 20 lines where multiple levels of abstraction are mixed together
- Nesting depth greater than 3 levels
- Parameter list with more than 4 parameters
- Magic numbers or unexplained constants
- Variable names that require reading the implementation to understand (e.g., `d`, `tmp2`, `flag`)
- Boolean expressions with 3 or more conditions combined
- Train-wreck chains: `a.getB().getC().doD()`
- Code names that do not match what the business calls the same concept
- Flag Arguments: a boolean parameter that makes the function do two fundamentally different
things depending on its value — a sign the function has two responsibilities
- Primitive Obsession: domain concepts represented as primitive types (`String email`,
`int orderId`, `double money`) rather than purpose-built value types — forces callers to know
which string is an email and which is a name
- Shallow module: the interface or documentation of a component is more complex relative to
the functionality it provides
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Long Method | Fowler — Refactoring | Long Method |
| Long Parameter List | Fowler — Refactoring | Long Parameter List |
| Message Chains | Fowler — Refactoring | Message Chains |
| Flag Arguments | Fowler — Refactoring | Flag Arguments |
| Primitive Obsession | Fowler — Refactoring | Primitive Obsession |
| Function length and nesting | McConnell — Code Complete | Ch. 7: High-Quality Routines |
| Variable naming | McConnell — Code Complete | Ch. 11: The Power of Variable Names |
| Magic numbers | McConnell — Code Complete | Ch. 12: Fundamental Data Types |
| Domain name mismatch | Evans — Domain-Driven Design | Ubiquitous Language |
| Shallow Module | Ousterhout — A Philosophy of Software Design | Ch. 4: Modules Should Be Deep |
### Severity Guide
- 🔴 Critical: function > 50 lines, nesting > 5, or virtually no meaningful names
- 🟡 Warning: function 2050 lines, nesting 45, some unclear names
- 🟢 Suggestion: minor naming issues, 12 magic numbers, isolated train-wreck chains
### What Not to Flag
- Linear code with clear names and guard clauses is not automatically high cognitive load
- Internal implementation detail hidden behind a deep, simple module boundary is not a shallow-module problem
- Domain-specific terminology should not be flagged if it matches how experts actually speak
---
## Risk 2: Change Propagation
**Diagnostic question:** How many unrelated things break when you change one thing?
Each change ripples to unrelated modules, slowing velocity and multiplying regression risk.
### Symptoms
- Modifying one feature requires touching more than 3 files in unrelated modules
- One class changes for multiple different business reasons (e.g., `UserService` changes for
billing logic AND notification logic AND profile logic)
- A method uses more data from another class than from its own class
- Two classes know each other's internal state directly
- Changing one module requires recompiling or retesting many unrelated modules
- **Hyrum's Law**: with sufficient callers, every observable behavior — including
implementation details, error message text, coincidental call ordering, and undocumented
side effects — becomes an implicit contract that callers depend on, even though it was
never guaranteed by the declared API
- **Orthogonality violation**: changing one dimension of a feature forces edits in
unrelated dimensions — adding a new payment type should not require touching logging,
caching, or notification code, but in a non-orthogonal design it does
- Information Leakage: a design decision (e.g., a file format, protocol detail, or data
shape) is encoded in more than one module, so changing it requires coordinated edits
in multiple places even though only one module "owns" the concept
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Shotgun Surgery | Fowler — Refactoring | Shotgun Surgery |
| Divergent Change | Fowler — Refactoring | Divergent Change |
| Feature Envy | Fowler — Refactoring | Feature Envy |
| Inappropriate Intimacy | Fowler — Refactoring | Inappropriate Intimacy |
| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle |
| High change propagation radius | Brooks — The Mythical Man-Month | Ch. 2: Brooks's Law (communication overhead) |
| Hyrum's Law | Winters et al. — Software Engineering at Google | Ch. 1: Hyrum's Law |
| Information Leakage | Ousterhout — A Philosophy of Software Design | Ch. 5: Information Hiding and Leakage |
### Severity Guide
- 🔴 Critical: one change touches > 5 files, or there is a structural dependency inversion (domain depends on infrastructure)
- 🟡 Warning: one change touches 35 files, mild coupling between modules
- 🟢 Suggestion: minor coupling, easily isolatable
### What Not to Flag
- A composition root wiring concrete dependencies is not a DIP violation by itself
- A stable public API with intentionally supported behavior is not automatically Hyrum's Law debt
- Similar edits inside one bounded context may be normal coordinated change, not shotgun surgery
---
## Risk 3: Knowledge Duplication
**Diagnostic question:** Is the same decision expressed in more than one place?
Multiple copies drift apart silently. DRY is about decisions, not code lines.
### Symptoms
- Same logic copy-pasted across multiple files or functions
- Same concept named differently in different parts of the codebase
(e.g., `user`, `account`, `member`, `customer` all referring to the same domain entity)
- Parallel class hierarchies that must change in sync
(e.g., adding a new payment type requires adding a class in 3 different hierarchies)
- Configuration values repeated as literals in multiple places
- Two modules that implement the same algorithm independently
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Code duplication | Fowler — Refactoring | Duplicate Code |
| Parallel Inheritance | Fowler — Refactoring | Parallel Inheritance Hierarchies |
| DRY violation | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
| Inconsistent naming | Evans — Domain-Driven Design | Ubiquitous Language |
| Alternative Classes | Fowler — Refactoring | Alternative Classes with Different Interfaces |
### Severity Guide
- 🔴 Critical: core business logic duplicated across modules, or same domain concept named 3+ different ways
- 🟡 Warning: utility code duplicated, naming inconsistent within a subsystem
- 🟢 Suggestion: minor literal duplication, single naming inconsistency
### What Not to Flag
- Repetition across separate bounded contexts is not automatically duplicate knowledge
- Temporary duplication during an active extraction or migration is not necessarily debt
- Shared protocol constants repeated at explicit boundaries may be acceptable when local ownership is clearer
---
## Risk 4: Accidental Complexity
**Diagnostic question:** Is the code more complex than the problem it solves?
Accidental complexity accumulates addition by addition until developers fight scaffolding more than solving the problem.
### Symptoms
- Abstractions built "for future use" with no current consumer
(e.g., a plugin system for a use case that has only one known implementation)
- Classes that barely justify their existence (wrap a single method call)
- Classes that only delegate to another class without adding behavior (pure middle-men)
- Second attempt at a system that is significantly more elaborate than the first,
adding generality for requirements that do not yet exist
- Switch statements that signal missing polymorphism
- Configuration options that have never been changed from their defaults
- Framework code larger than the application it powers
- Code grown under sustained tactical shortcuts: each workaround seemed small, but
accumulated shortcuts mean every new feature requires fighting the existing structure
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Speculative Generality | Fowler — Refactoring | Speculative Generality |
| Lazy Class | Fowler — Refactoring | Lazy Class |
| Middle Man | Fowler — Refactoring | Middle Man |
| Switch Statements | Fowler — Refactoring | Switch Statements |
| Second System Effect | Brooks — The Mythical Man-Month | Ch. 5: The Second-System Effect |
| YAGNI violations | McConnell — Code Complete | Ch. 5: Design in Construction |
| Over-engineering | Hunt & Thomas — The Pragmatic Programmer | Topic 4: Good-Enough Software |
| Tactical programming debt | Ousterhout — A Philosophy of Software Design | Ch. 3: Strategic vs. Tactical Programming |
### Severity Guide
- 🔴 Critical: an entire subsystem built around a speculative requirement, or framework overhead dominates domain logic
- 🟡 Warning: several unnecessary abstractions or wrapper classes, unused configuration systems
- 🟢 Suggestion: one or two lazy classes or middle-man patterns in non-critical paths
### What Not to Flag
- A switch over an external protocol, wire format, or closed enum is not automatically missing polymorphism
- Thin wrappers that absorb vendor churn or hide instability may be justified
- A larger second version is not second-system effect unless the added generality exceeds present needs
---
## Risk 5: Dependency Disorder
**Diagnostic question:** Do dependencies flow in a consistent, predictable direction?
When business logic depends on infrastructure, infrastructure changes cascade into domain changes. Cycles prevent isolation.
### Symptoms
- Circular dependencies between modules or packages
- High-level business logic directly imports from low-level infrastructure
(e.g., a domain service imports from a specific database driver)
- Stable, widely-used components depend on unstable, frequently-changing ones
- Abstract components depending on concrete implementations
- Law of Demeter violations: `order.getCustomer().getAddress().getCity()`
- Module fan-out greater than 5 (imports from more than 5 other modules)
- A module implements an interface but only uses a subset of its methods, or must
provide stub implementations for methods it does not need (ISP violation: fat interface
forces unwanted dependencies on callers)
- The system feels like "one mind did not design this" — different modules use
incompatible architectural patterns with no clear rule for which to use where
- Direct version-pinned dependencies on transitive packages (diamond dependency risk);
upgrading one library requires coordinating multiple unrelated teams or repositories
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Dependency cycles | Martin — Clean Architecture | Acyclic Dependencies Principle (ADP) |
| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle (DIP) |
| Instability direction | Martin — Clean Architecture | Stable Dependencies Principle (SDP) |
| Abstraction mismatch | Martin — Clean Architecture | Stable Abstractions Principle (SAP) |
| ISP violation | Martin — Clean Architecture | Interface Segregation Principle (ISP) |
| Conceptual integrity | Brooks — The Mythical Man-Month | Ch. 4: Conceptual Integrity |
| Law of Demeter | Hunt & Thomas — The Pragmatic Programmer | Ch. 5: Decoupling and the Law of Demeter |
| SOLID violations | Martin — Clean Architecture | Single Responsibility, Open/Closed Principles |
| Diamond dependency / upgrade blockage | Winters et al. — Software Engineering at Google | Ch. 21: Dependency Management |
### Severity Guide
- 🔴 Critical: dependency cycles present, or domain layer directly depends on infrastructure layer
- 🟡 Warning: several SDP or DIP violations but no cycles; conceptual inconsistency across modules
- 🟢 Suggestion: minor Demeter violations, slightly elevated fan-out in isolated modules
### What Not to Flag
- High fan-out in an orchestration layer or composition root is not automatically disorder
- Adapter modules may depend on both domain and infrastructure when they explicitly translate across the boundary
- A stable facade over many leaf dependencies can be healthy if dependency policy is clear
---
## Risk 6: Domain Model Distortion
**Diagnostic question:** Does the code faithfully represent the problem it is solving?
Code that mismatches business language forces mental translation. Over time it models schemas instead of the domain, with logic bleeding into service layers.
### Symptoms
- Business logic scattered across service layers while domain objects have only getters and setters
(anemic domain model)
- Code variable, class, or method names that do not match what business stakeholders call the concept
- A class whose only purpose is to hold data with no behavior (pure data bag)
- A subclass that ignores or overrides most of its parent's behavior (refuses the inheritance)
- Bounded context boundaries crossed without any translation or anti-corruption layer
- Methods that are more interested in the data of another class than their own
(domain logic in the wrong place)
- A subclass overrides most parent methods with incompatible behavior or throws exceptions
where the parent contract guarantees success (LSP violation: substitution breaks callers)
- Value Objects treated as Entities: a concept defined entirely by its attributes (e.g., Money,
Email, Address) is given a mutable ID and lifecycle instead of being replaced when changed
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Anemic Domain Model | Evans — Domain-Driven Design | Domain Model pattern |
| Ubiquitous Language drift | Evans — Domain-Driven Design | Ubiquitous Language |
| Bounded context violation | Evans — Domain-Driven Design | Bounded Context |
| Data Class | Fowler — Refactoring | Data Class |
| Refused Bequest | Fowler — Refactoring | Refused Bequest |
| Feature Envy | Fowler — Refactoring | Feature Envy |
| LSP violation | Martin — Clean Architecture | Liskov Substitution Principle (LSP) |
### Severity Guide
- 🔴 Critical: domain logic entirely in service layer, domain objects are pure data bags with no behavior
- 🟡 Warning: partial anemia, some naming inconsistency between code and domain language
- 🟢 Suggestion: minor naming drift in non-core areas, isolated cases of Feature Envy
### What Not to Flag
- CRUD-heavy workflows may legitimately use transaction scripts instead of rich domain objects
- DTOs, persistence records, and API payload models are allowed to be data-only
- Shared infrastructure language should not be mistaken for domain drift if the business model itself is simple

View File

@ -0,0 +1,37 @@
# Remedy Guide — Actionable Fix Mode
When `--fix` is active, enhance every finding's Remedy field to be directly actionable:
## Remedy Enhancement Rules
For each finding, the Remedy must include:
1. **Target**: exact file path and function/class name
2. **Action**: specific refactoring operation (e.g., "Extract lines 45-67 into a new
function `calculateShippingCost(items, config)`")
3. **Rationale**: one sentence explaining why this specific fix (not just "refactor")
## Fixability Classification
Classify each finding after writing the enhanced Remedy:
| Tier | Criteria | Report label |
|------|---------|-------------|
| Quick fix | Single-file, mechanical: rename, extract constant, reorder imports | `[quick-fix]` |
| Guided fix | Requires a design choice: where to split, what interface shape | `[guided]` |
| Manual | Cross-module, needs domain knowledge or team discussion | `[manual]` |
Append the label to the finding title: `**R1 — Long function in OrderService [quick-fix]**`
## Output Addition
After the standard report, add a **Fix Summary** section:
| Finding | Tier | Target File | Action |
|---------|------|------------|--------|
| R1 — Long function | quick-fix | src/order.ts:45 | Extract `calculateTotal()` |
| R5 — Circular dep | manual | src/models/ ↔ src/services/ | Introduce interface boundary |
## What NOT to do
- Do NOT modify any files. Phase 1 is diagnosis + actionable plan only.
- Do NOT generate diffs or code blocks. The Remedy text IS the deliverable.
- Do NOT re-score. The Health Score reflects current state, not projected state.

View File

@ -0,0 +1,248 @@
---
books:
- The Mythical Man-Month
- Code Complete
- Refactoring
- Clean Architecture
- The Pragmatic Programmer
- Domain-Driven Design
- A Philosophy of Software Design
- Software Engineering at Google
- xUnit Test Patterns
- The Art of Unit Testing
- Working Effectively with Legacy Code
- How Google Tests Software
---
# Source Coverage Matrix
Use this file after selecting a mode and before writing findings.
It exists to prevent shallow "book-name citation" reviews.
## Review Discipline
- Cite a book only when the observed symptom actually matches that book's principle.
- A threshold crossing is a hint, not a verdict. Check context, intent, and blast radius.
- Look for justified tradeoffs before flagging a smell as debt.
- Prefer concrete architectural or domain consequences over abstract style complaints.
- If two books pull in different directions, state the tradeoff instead of pretending there is no tension.
---
## Frederick Brooks — *The Mythical Man-Month*
**Encoded today**
- Change propagation as communication overhead
- Second-System Effect
- Conceptual Integrity
**Do not ignore**
- Whether the design shows a single coherent idea or competing local optimizations
- Whether cross-team coordination cost is becoming part of feature cost
**Do not over-flag**
- Large systems are not automatically second systems
- Multi-module designs are acceptable when they preserve conceptual integrity
---
## Steve McConnell — *Code Complete*
**Encoded today**
- Routine length, nesting, naming, and magic numbers
- Construction-phase YAGNI checks
- Defensive programming and error-handling discipline (guard clauses, input validation,
explicit error paths, assertions for invariants)
**Do not ignore**
- Whether low-level readability choices compound into operational risk
- Whether missing error handling makes failure modes invisible to maintainers
**Do not over-flag**
- Small, explicit guard clauses are not cognitive overload
- A long routine may be acceptable when it is linear, well-named, and single-purpose
---
## Martin Fowler — *Refactoring*
**Encoded today**
- Long Method, Long Parameter List, Message Chains
- Shotgun Surgery, Divergent Change, Feature Envy, Inappropriate Intimacy
- Duplicate Code, Speculative Generality, Lazy Class, Middle Man, Data Class
- Flag Arguments: boolean parameters that split a function into two behaviors
- Primitive Obsession: domain concepts expressed as raw primitive types instead of value types
**Do not ignore**
- Whether the code smell is local or systemic
- Whether a refactoring target has a natural home in the model
**Do not over-flag**
- Temporary duplication during an active extraction is not always debt
- A data-focused structure is acceptable when it is intentionally a DTO or boundary record
---
## Robert C. Martin — *Clean Architecture*
**Encoded today**
- DIP, ADP, SDP, SAP, and layering direction
- ISP: fat interfaces that force callers to depend on methods they do not use
- LSP: subclasses that break the behavioral contract of their parent type
- SRP and OCP: classes with multiple reasons to change; modules closed to modification
but open to extension via abstraction
**Do not ignore**
- Policy vs detail boundaries
- Whether dependency arrows preserve replaceability and testability
**Do not over-flag**
- Composition roots may depend on concrete infrastructure by design
- Thin adapter layers can import both directions when they are explicitly boundary glue
---
## Andrew Hunt & David Thomas — *The Pragmatic Programmer*
**Encoded today**
- Orthogonality
- DRY
- Law of Demeter
**Do not ignore**
- Whether knowledge duplication is really duplicated decision-making
- Whether coupling is accidental or a deliberate local simplification
**Do not over-flag**
- Similar code in different bounded contexts is not automatically a DRY violation
- Direct object access inside a cohesive aggregate is not always a Demeter problem
---
## Eric Evans — *Domain-Driven Design*
**Encoded today**
- Ubiquitous Language
- Bounded Context
- Anemic Domain Model
- Entity vs Value Object: objects with identity and lifecycle vs. objects defined solely by
their attributes (Money, Email, Address should be immutable value types, not mutable entities)
- Aggregate Roots: who owns the invariant boundary; cross-aggregate access only through the root
**Do not ignore**
- Aggregate boundaries, invariant ownership, and anti-corruption layers
- Whether names match the business language used by experts
**Do not over-flag**
- CRUD-heavy workflows may legitimately use transaction scripts
- Thin entities are acceptable when the domain itself is simple
---
## John Ousterhout — *A Philosophy of Software Design*
**Encoded today**
- Deep vs shallow modules
- Strategic vs tactical programming
- Information Leakage: a design decision encoded in more than one module, creating
change coupling even when no explicit import exists between the modules
**Do not ignore**
- Interface complexity relative to hidden complexity
- Whether repeated tactical patches are raising long-term cognitive load
- Whether a "helper" exposes internal design decisions that callers should not know
**Do not over-flag**
- Internal implementation complexity is fine when the interface stays simple
- A small wrapper is acceptable when it meaningfully absorbs volatility
---
## Titus Winters, Tom Manshreck, Hyrum Wright — *Software Engineering at Google*
**Encoded today**
- Hyrum's Law
- Dependency management and upgrade blockage
- Code sustainability: whether code as written can be maintained, migrated, and upgraded
over a multi-year horizon without heroic effort
- Backward compatibility: whether API changes preserve existing callers or force
coordinated upgrades across the organization
**Do not ignore**
- De facto APIs created by observable behavior
- The maintenance cost of exposing too much surface area
- Whether the dependency graph will allow independent upgrades over time
**Do not over-flag**
- A stable public API is not a liability if it is intentionally supported
- Fan-out alone is not disorder when dependency policy is explicit and governed
---
## Gerard Meszaros — *xUnit Test Patterns*
**Encoded today**
- Assertion Roulette, Mystery Guest, General Fixture
- Eager Test, Lazy Test, Test Code Duplication, Behavior Verification
- Erratic Test: tests that produce non-deterministic results due to shared state,
time dependence, or ordering assumptions between tests
**Do not ignore**
- Whether test failures are diagnosable
- Whether the suite shape amplifies maintenance cost
**Do not over-flag**
- Multiple assertions are acceptable when they express one behavior with one failure story
- Shared fixtures are acceptable when every field is relevant to the scenario
---
## Roy Osherove — *The Art of Unit Testing*
**Encoded today**
- Test naming discipline
- Test isolation
- Mock usage guidelines
- Completeness of edge-path tests
**Do not ignore**
- Whether tests verify behavior rather than wiring
- Whether seams are used to simplify tests, or production code is being contorted for testability
**Do not over-flag**
- A mock is acceptable when the dependency is nondeterministic and the assertion still verifies behavior
- Naming conventions are guidance; clarity is the goal
---
## Michael Feathers — *Working Effectively with Legacy Code*
**Encoded today**
- Legacy code as code without tests
- Sensing and Separation
- Seams
- Characterization Tests
**Do not ignore**
- Whether the team can change a risky area safely today
- Whether the code offers any seam for isolating behavior under change
**Do not over-flag**
- Untested code is not automatically legacy if it is stable and not under active change
- Characterization tests are most important before modifying unclear existing behavior
---
## Google Engineering — *How Google Tests Software*
**Encoded today**
- Change coverage vs line coverage
- Pyramid shape and suite portfolio economics
**Do not ignore**
- Whether the suite reflects business risk, not just percentages
- Whether expensive tests dominate feedback loops
**Do not over-flag**
- A non-70:20:10 ratio can be healthy when justified by platform constraints or product risk
- High coverage is useful when paired with meaningful branch and change protection

View File

@ -0,0 +1,246 @@
# Test Decay Risk Reference
Six patterns that cause test suites to degrade. Apply the Iron Law to each finding.
---
## Risk T1: Test Obscurity
**Diagnostic question:** How much effort does it take to understand what this test verifies?
Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite.
### Symptoms
- Assertion Roulette: multiple assertions with no message string — when one fails, it is
impossible to determine which behavior broke without reading every assertion
- Mystery Guest: test depends on external state (files, database rows, shared fixtures)
that is not visible in the test body
- Test names that do not express the scenario and expected outcome
(e.g., `test1`, `shouldWork`, `testLogin`, `testUserService`)
- General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making
each test's preconditions invisible
- Test body requires reading production code to understand what is being verified
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Assertion Roulette | Meszaros — xUnit Test Patterns | Assertion Roulette (p.224) |
| Mystery Guest | Meszaros — xUnit Test Patterns | Mystery Guest (p.411) |
| General Fixture | Meszaros — xUnit Test Patterns | General Fixture (p.316) |
| Test naming | Osherove — The Art of Unit Testing | method_scenario_expected naming convention |
### Severity Guide
- 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages
- 🟡 Warning: multiple Mystery Guests; several ambiguous test names
- 🟢 Suggestion: minor naming issues; isolated General Fixture
### What Not to Flag
- Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story
- Shared setup is fine when every initialized value is relevant to nearly every test
- Concise test names are acceptable if scenario and expected outcome are still obvious
---
## Risk T2: Test Brittleness
**Diagnostic question:** Do tests break when you refactor without changing behavior?
Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite.
### Symptoms
- Tests assert on private method results, internal state, or implementation details
rather than observable behavior
- Eager Test: one test method verifies multiple unrelated behaviors; any single change
causes it to fail regardless of which behavior was touched
- Over-specified: assertions enforce mock call order or exact parameter values that are
irrelevant to the behavior being tested
- Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed
- Erratic Test: a test produces different results across runs without any change to
production code — caused by race conditions, time-dependent logic, random data, or
shared mutable state between tests
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Eager Test | Meszaros — xUnit Test Patterns | Eager Test (p.228) |
| Erratic Test | Meszaros — xUnit Test Patterns | Erratic Test |
| Implementation coupling | Osherove — The Art of Unit Testing | Test isolation principle |
| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
### Severity Guide
- 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail
- 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions
- 🟢 Suggestion: isolated over-specification in non-critical tests
### What Not to Flag
- Verifying an externally observable event or emitted command is not implementation coupling
- One test with several assertions is acceptable when all assertions support one behavior claim
- A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring
---
## Risk T3: Test Duplication
**Diagnostic question:** Is the same test scenario expressed in more than one place?
Duplicated tests must change in multiple places and create false confidence without testing distinct behavior.
### Symptoms
- Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests
without extraction into a shared helper
- Lazy Test: multiple tests verifying identical behavior with no differentiation in input,
state, or expected output
- Same boundary condition tested identically at unit, integration, and E2E level —
three copies with no layer differentiation
- Test helper functions or fixtures duplicated across test files instead of shared
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Test Code Duplication | Meszaros — xUnit Test Patterns | Test Code Duplication (p.213) |
| Lazy Test | Meszaros — xUnit Test Patterns | Lazy Test (p.232) |
| DRY violation in tests | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
### Severity Guide
- 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation
- 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction
- 🟢 Suggestion: minor helper duplication; isolated Lazy Tests
### What Not to Flag
- The same scenario may appear at unit and integration level when each layer verifies a distinct risk
- Small local setup duplication can be clearer than an over-abstracted fixture maze
- Similar assertions against different domain rules are not Lazy Tests if the business intent differs
---
## Risk T4: Mock Abuse
**Diagnostic question:** Is the test more complex than the behavior it tests?
Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up.
### Symptoms
- Mock setup code is longer than the test logic itself
- Primary assertion is `expect(mock).toHaveBeenCalledWith(...)` — the test verifies
that a mock was called, not that any real behavior occurred
- Test-only methods added to production classes for lifecycle management in tests
- Single unit test uses more than 3 mocks
- Incomplete Mock: mock object missing fields that downstream code will access,
causing silent failures only visible in integration
- Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Mock count > 3 | Osherove — The Art of Unit Testing | Mock usage guidelines |
| Testing mock behavior | Meszaros — xUnit Test Patterns | Behavior Verification (p.544) |
| Test-only production methods | Feathers — Working Effectively with Legacy Code | Ch. 3: Sensing and Separation |
| Hard-Coded Test Data | Meszaros — xUnit Test Patterns | Hard-Coded Test Data (p.534) |
| Incomplete Mock | Osherove — The Art of Unit Testing | Mock completeness requirement |
### Severity Guide
- 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests
- 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications
- 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data
### What Not to Flag
- A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior
- Fakes and spies used to observe state transitions are not mock abuse by default
- One interaction assertion may be appropriate when the interaction itself is the behavior under test
---
## Risk T5: Coverage Illusion
**Diagnostic question:** Does the test suite actually protect against the failures that matter?
Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered."
### Symptoms
- High line coverage but error-handling branches, boundary conditions, and exception paths
have no corresponding tests
- Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases
- Legacy code areas are being actively modified with no tests present
(Feathers: "legacy code is code without tests")
- Coverage percentage treated as a sign-off criterion; critical change paths remain untested
- Tests assert on return values but not on important side effects such as database writes,
event publications, or state transitions
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Legacy code = no tests | Feathers — Working Effectively with Legacy Code | Ch. 1: "Legacy code is code without tests" |
| Change coverage vs line coverage | Google — How Google Tests Software | Ch. 11: Testing at Google Scale |
| Happy-path only | Osherove — The Art of Unit Testing | Test completeness principle |
### Severity Guide
- 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent
- 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent
- 🟢 Suggestion: a few non-critical paths missing sad-path tests
### What Not to Flag
- High line coverage is useful when paired with branch, boundary, and change-path coverage
- A new module may have limited coverage early if it is still private and low-risk
- Side-effect assertions may live in integration tests rather than unit tests without implying a gap
---
## Risk T6: Architecture Mismatch
**Diagnostic question:** Does the test suite structure reflect the system's actual risk profile?
Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer.
### Symptoms
- Inverted test pyramid: E2E or integration test count exceeds unit test count,
causing a slow and fragile suite
- Legacy code with no seam points: no interfaces, dependency injection, or seams exist,
making it impossible to test in isolation without modifying production code
- Legacy areas being modified have no Characterization Tests to capture current behavior
before changes are made
- Full suite execution time exceeds 10 minutes (indicates architectural problem,
not a performance problem — too many slow tests)
- High-risk and low-risk paths are tested at identical density;
no risk-based prioritization in test distribution
### Sources
| Symptom | Book | Principle / Smell |
|---------|------|-------------------|
| Inverted pyramid | Google — How Google Tests Software | 70:20:10 unit:integration:E2E ratio |
| No seam points | Feathers — Working Effectively with Legacy Code | Ch. 4: Seam Model |
| Missing Characterization Tests | Feathers — Working Effectively with Legacy Code | Ch. 13: Characterization Tests |
| Suite execution time | Meszaros — xUnit Test Patterns | Slow Tests (p. 253) |
### Severity Guide
- 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted
- 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests
- 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests
### What Not to Flag
- Deviating from 70:20:10 can be justified by platform constraints or product risk
- A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered
- A small number of critical-path E2E tests is desirable, not a smell

42
skills/thirdparty/brooks-audit/SKILL.md vendored Normal file
View File

@ -0,0 +1,42 @@
---
name: brooks-audit
description: >
Architecture audit that maps module dependencies, checks layering integrity, and
flags structural decay across a codebase, drawing on twelve classic engineering books.
Triggers when: user asks to audit architecture, review folder/module structure,
check for circular imports, understand how the codebase is organized, or asks
"does this follow clean architecture?", "why does everything depend on everything?",
"are our layers correct?", "where should this code live?".
Also triggers for onboarding requests: "explain this codebase to a new developer"
or "give me a codebase tour" (use onboarding mode).
Do NOT trigger for: PR-level code review (use brooks-review) or line-level refactoring
questions — this skill analyzes structural/module-level concerns, not individual functions.
---
# Brooks-Lint — Architecture Audit
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
4. Read `architecture-guide.md` in this directory for the audit framework
## Process
**Onboarding mode:** If the user asks for an onboarding report, codebase tour, or
"explain this codebase to a new developer", read `onboarding-guide.md` from this
directory and follow it instead of `architecture-guide.md`. This mode explains rather
than diagnoses — no Health Score, no Iron Law findings.
**If the user has not specified files or a directory to audit:** apply Auto Scope
Detection from `../_shared/common.md` to determine the audit scope before proceeding.
1. Gather codebase context and draw the module dependency graph as Mermaid (Steps 01 of the guide)
2. Scan for each decay risk in the order specified (Steps 24 of the guide)
3. Assign node colors in the Mermaid diagram based on findings (red/yellow/green) — after Step 4
4. Run the Testability Seam Assessment (Step 5 of the guide)
5. Run the Conway's Law check (Step 6 of the guide)
6. Output using the Report Template from common.md — Mermaid graph FIRST, then Findings
**Mode line in report:** `Architecture Audit`

View File

@ -0,0 +1,195 @@
# Architecture Audit Guide — Mode 2
**Purpose:** Analyze the module and dependency structure of a system for decay risks that
operate at the architectural level. Every finding must follow the Iron Law:
Symptom → Source → Consequence → Remedy.
**Monorepo note:** Treat each deployable service or library as a top-level module. Draw
dependencies between services, not between their internal packages. Apply the Conway's Law
check at the service ownership level. Within a single service, apply standard module-level analysis.
---
## Analysis Process
Work through these six steps in order.
### Step 0: Gather Codebase Context
Before drawing anything, establish what you can see.
**If the user provided a full directory tree or pasted relevant file contents:** skip the
proactive reading below and proceed to Step 1.
**Otherwise, proactively read the project using these tools:**
1. **Top-level structure** — glob top two levels to identify module boundaries:
```
Glob: **/*(depth 2, directories only)
```
2. **Entry points** — read the package manifest or main config file (e.g., `package.json`,
`go.mod`, `pom.xml`, `Cargo.toml`, `pyproject.toml`) to confirm language, framework,
and declared dependencies.
3. **Dependency edges** — grep import statements to discover inter-module calls. Run once
per language present; limit to the first 200 matches to avoid token overrun:
```
Grep: "^\s*(import|from|require\(|use )" across *.ts|*.py|*.go|*.rs|*.java
```
4. **Large modules** — for any top-level directory with > 10 files, read the file matching
`index.*`, `main.*`, or `__init__.*` to understand its stated responsibility.
**Stop when you can answer all three:**
- What are the top-level modules (names and count)?
- Which modules import from which other modules?
- Which module has the highest fan-in or fan-out?
If the project has > 100 top-level files or > 4 levels of nesting, note which areas were
sampled vs. inferred, and flag this in the report scope line.
### Step 1: Draw the Module Dependency Graph (Mermaid)
Before evaluating any risk, map the dependencies as a Mermaid diagram. Use this format:
````mermaid
graph TD
subgraph UI
WebApp
MobileApp
end
subgraph Domain
AuthService
OrderService
PaymentService
end
subgraph Infrastructure
Database
MessageQueue
end
WebApp --> AuthService
WebApp --> OrderService
MobileApp --> AuthService
MobileApp --> OrderService
OrderService --> PaymentService
OrderService --> Database
OrderService --> MessageQueue
PaymentService --> Database
AuthService -.->|circular| OrderService
classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff
classDef warning fill:#ffd43b,stroke:#e67700
classDef clean fill:#51cf66,stroke:#2b8a3e,color:#fff
class PaymentService critical
class OrderService warning
class Database,MessageQueue,AuthService,WebApp,MobileApp clean
````
Draw the graph structure first — nodes, subgraphs, and edges — without any `classDef` or
`class` lines. You cannot assign colors until you have completed the risk scan in Steps 24.
**After completing Step 4**, return to this graph and add the `classDef` and `class` lines
based on findings. The example above shows the final colored output.
Rules:
1. **Nodes** — Use top-level directories or services as nodes, not individual files
2. **Grouping** — One `subgraph` per architectural layer or top-level directory (e.g., UI, Domain, Infrastructure)
3. **Edges** — Solid arrows (`-->`) point FROM the depending module TO the dependency; use dotted arrows with label (`-.->|circular|`) for circular dependencies. If no circular dependencies exist, use only solid arrows
4. **Node limit** — Keep the graph to ~50 nodes maximum; collapse low-risk leaf modules into their parent if needed
5. **Fan-out** — For any node with fan-out > 5, use a descriptive label: `HighFanOutModule["ModuleName (fan-out: 7)"]`
6. **Colors** — Apply `classDef` colors AFTER completing Steps 2-4: `critical` (red `#ff6b6b`) for nodes with Critical findings, `warning` (yellow `#ffd43b`) for Warning findings, `clean` (green `#51cf66`) for nodes with no findings or only Suggestions. If no findings at all, classify all nodes as `clean`
7. **Direction** — Default to `graph TD` (top-down); use `graph LR` only if the architecture is clearly a left-to-right pipeline
### Step 2: Scan for Dependency Disorder
*The most architecturally consequential risk — scan this first.*
Look for:
- Circular dependencies (any `-.->|circular|` edge in the map above)
- Arrows flowing upward (high-level domain depending on low-level infrastructure)
- Stable, widely-depended-on modules that import from frequently-changing modules
- Modules with fan-out > 5
- Absence of a clear layering rule (no consistent answer to "what depends on what?")
### Step 3: Scan for Domain Model Distortion
Look for:
- Do module names match the business domain vocabulary?
- Is there a layer called "services" that contains all the business logic while domain objects
are pure data structures?
- Are there modules that cross bounded context boundaries (e.g., billing logic in the user module)?
- Is there an anti-corruption layer where external systems interface with the domain?
### Step 4: Scan for Remaining Four Risks
Check each in turn:
**Knowledge Duplication:**
- Are there multiple modules implementing the same concept independently?
- Does the same domain concept appear under different names in different modules?
**Accidental Complexity:**
- Are there entire layers in the architecture that do not add value?
- Are there modules whose responsibility cannot be stated in one sentence?
**Change Propagation:**
- Which modules are "blast radius hotspots"? (A change here requires changes in many other modules)
- Does the dependency map reveal why certain features are slow to develop?
**Cognitive Overload:**
- Can the module responsibility of each module be stated in one sentence from its name alone?
- Would a new developer know which module to add a new feature to?
### Step 5: Testability Seam Assessment
A *seam* is a place in the architecture where behavior can be altered without editing source
code — typically an interface, a configuration point, or a dependency injection boundary.
Seam density is a proxy for testability and evolvability.
Scan for:
- **No seam at the infrastructure boundary**: can you replace a real database, file system,
or HTTP client with a test double without editing the module under test? If not, the
architecture forces integration tests where unit tests would suffice.
- **Seam collapse**: a module that was once testable in isolation has had its seams removed
(e.g., direct constructor instantiation replaced a dependency injection point, or a global
singleton replaced an injected collaborator).
- **Missing seam in legacy areas**: modules without an obvious injection point or interface
boundary — any change requires touching the entire call stack to substitute behavior.
If all modules have clear seams at their infrastructure boundaries → no finding.
If seams are absent or collapsed: flag as 🟡 Warning with a Remedy pointing to the specific
module and the injection point that needs to be restored or introduced.
Source: Feathers — Working Effectively with Legacy Code, Ch. 4: The Seam Model
### Step 6: Conway's Law Check
After the six-risk scan, assess the relationship between architecture and team structure:
- Does the module/service structure reflect the team structure?
(Conway's Law: "Organizations design systems that mirror their communication structure")
- If yes: is this intentional design or accidental coupling?
- A mismatch that causes cross-team coordination overhead for every feature is 🔴 Critical.
- A mismatch that is theoretical but not yet causing pain is 🟡 Warning.
- If team structure is unknown, note this as context missing and skip the check.
**Calibration examples:**
- 🔴 Critical: the Payments module is owned by Team A but contains auth logic owned by Team B —
every Payments change requires a sync meeting with Team B
- 🟡 Warning: two separate teams own the `utils/` and `helpers/` directories which do the same
things — theoretically painful but not yet causing release coordination issues
- Not a finding: a single team owns a monorepo with multiple logical modules — Conway's Law
misalignment requires *separate teams* to be meaningful
---
## Output
Use the standard Report Template from `../_shared/common.md`. Mode: Architecture Audit.
Place the Mermaid dependency graph FIRST under "Module Dependency Graph". Reference
relevant node names in findings. Add `classDef` color assignments LAST, after all
findings are identified.

View File

@ -0,0 +1,89 @@
# Codebase Onboarding Guide
**Purpose:** Produce a newcomer-friendly tour of the codebase. This is NOT a diagnostic
report — no Health Score, no Iron Law findings. Focus on explanation and orientation.
---
## Process
### Step 1: Map the Territory
- Read top-level structure (same as architecture-guide Step 0)
- Output: a plain-language overview of what each top-level module does (one sentence each)
- Group into layers: "Things users interact with", "Business logic", "Infrastructure"
### Step 2: Draw the Dependency Map
Draw the same Mermaid dependency graph as architecture audit Step 1, but color nodes by
**recommended reading order** using a DISTINCT palette from the severity palette
(which uses red/yellow/green). This avoids confusing "red = danger" with "red = read last":
- 🔵 Blue (`#339af0`): start here — entry points, core domain
- 🟣 Purple (`#9775fa`): read next — supporting modules
- ⚪ Gray (`#ced4da`): read last — infrastructure, generated code, utilities
Add numbered labels: `CoreModule["1. CoreModule"]`
```
classDef start fill:#339af0,color:#fff
classDef next fill:#9775fa,color:#fff
classDef last fill:#ced4da
```
### Step 3: Highlight Key Conventions
Identify and document patterns the codebase follows:
- Naming conventions (file naming, class naming, variable naming)
- Directory organization pattern (feature-based? layer-based? hybrid?)
- Error handling pattern (exceptions? result types? error codes?)
- Testing convention (co-located? separate directory? naming pattern?)
- Dependency injection pattern (if any)
### Step 4: Mark Danger Zones
For each module with known complexity or coupling issues, add a brief warning:
- "OrderService: high complexity, only modify with full test suite running"
- "legacy/: no tests, use Characterization Tests before changing"
Do NOT use Iron Law format — use plain warnings. This is orientation, not diagnosis.
### Step 5: Build a Domain Glossary
Extract 10-15 key domain terms from code (class names, method names, constants) and map
them to plain-language definitions. This applies Evans's Ubiquitous Language as documentation.
### Step 6: Suggest First Tasks
Based on the dependency map, suggest 2-3 low-risk areas where a new developer could make
their first contribution: modules with good test coverage, clear boundaries, low coupling.
---
## Output Template
```
# Codebase Tour: [Project Name]
## Overview
[2-3 sentence summary of what the project does and its tech stack]
## Module Map
[Mermaid graph with reading-order colors]
## Module Guide
[One paragraph per top-level module: what it does, what it depends on, key files to read]
## Conventions
[Bullet list of patterns this codebase follows]
## Danger Zones
[Bullet list of areas to be careful with, or "None identified" if the codebase is clean]
## Domain Glossary
| Term | Meaning |
|------|---------|
## Suggested First Tasks
[2-3 concrete suggestions for a new developer's first PR]
```

35
skills/thirdparty/brooks-debt/SKILL.md vendored Normal file
View File

@ -0,0 +1,35 @@
---
name: brooks-debt
description: >
Tech debt assessment that identifies, classifies, and prioritizes maintainability
problems — helping teams build a refactoring roadmap — drawing on twelve classic
engineering books.
Triggers when: user asks about tech debt, refactoring priorities, what to clean up
first, or asks "why is this so hard to change?", "where's the most painful part?",
"what should we fix first?", "how do I justify refactoring to management?",
"why is our velocity dropping?".
Do NOT trigger for: server health checks, HTTP /health endpoints, Kubernetes probes,
database health, or application uptime — "health" in those contexts is infrastructure,
not code quality. Also not for single-function refactoring questions.
---
# Brooks-Lint — Tech Debt Assessment
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
4. Read `debt-guide.md` in this directory for the debt classification framework
## Process
**If the user has not described the codebase or pointed to specific areas:** apply Auto
Scope Detection from `../_shared/common.md` to determine the assessment scope before proceeding.
1. Scan for all six decay risks (Step 1 of the guide); list every finding before scoring
2. Apply the Pain × Spread priority formula and classify debt intent (Steps 23 of the guide)
3. Group findings by decay risk (Step 4 of the guide)
4. Output using the Report Template from common.md, plus the Debt Summary Table
**Mode line in report:** `Tech Debt Assessment`

View File

@ -0,0 +1,125 @@
# Tech Debt Assessment Guide — Mode 3
**Purpose:** Identify, classify, and prioritize technical debt across the entire codebase.
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
---
## Evidence Gathering
If you have insufficient evidence to assess the codebase, ask the user ONE question —
choose the single question most relevant to what you already know:
1. "Which part of the codebase takes the longest to modify for a typical feature?"
2. "Which module do developers avoid touching, and why?"
3. "Which parts of the system have the fewest tests and the most bugs?"
4. "Is there a module that only one person fully understands?"
After one answer, proceed. Do not ask more than one question.
If the user declines or says they don't know, proceed with available evidence and note
which areas could not be assessed.
---
## Analysis Process
Work through these four steps in order.
### Step 1: Full Decay Risk Scan
Scan for all six decay risks across the entire codebase. List every finding before scoring
any of them. This prevents anchoring on early findings and missing systemic patterns.
For each risk, look for:
**Cognitive Overload:** Are there widespread naming problems, deeply nested logic, or
excessively long functions spread across many modules?
**Change Propagation:** Which modules cause the most ripple effects when changed?
Are there modules that everyone must modify when adding a new feature?
**Knowledge Duplication:** How many times is the same concept implemented independently?
Is the domain vocabulary consistent across the codebase?
**Accidental Complexity:** Are there architectural layers or abstractions that add no value?
Is the infrastructure overhead proportional to the problem being solved?
**Dependency Disorder:** Are there dependency cycles? Does domain logic depend on infrastructure?
Are there modules with no clear layering position?
**Domain Model Distortion:** Is business logic in the right layer?
Do code names match business names? Are domain objects anemic?
### Step 2: Score Each Finding with Pain × Spread
After listing all findings, score each one:
**Pain score (13):** How much does this slow down development today?
- 3: Developers actively avoid touching this area; it causes bugs on most changes
*(e.g., "nobody wants to touch the billing module because it always breaks something")*
- 2: This area is noticeably slower to work in than the rest of the codebase
*(e.g., "adding a field takes 23x longer here than elsewhere")*
- 1: This is a quality issue but not currently causing active pain
*(e.g., "inconsistent naming, but we always know what we mean")*
**Spread score (13):** How many files, modules, or developers does this affect?
- 3: Affects 5+ modules or all developers on the team
*(e.g., "every new feature touches the God class in core/")*
- 2: Affects 24 modules or a subset of the team
*(e.g., "the auth and notification modules are tightly coupled")*
- 1: Isolated to one module or one developer's area
*(e.g., "legacy parser that only one person maintains")*
**Priority = Pain × Spread** (max 9)
| Priority | Classification | Action |
|----------|---------------|--------|
| 79 | Critical debt | Address in next sprint |
| 46 | Scheduled debt | Plan within quarter |
| 13 | Monitored debt | Log and watch |
### Step 3: Classify Debt Intent
After scoring, classify each finding as intentional or accidental:
**Intentional debt** — a conscious shortcut taken to meet a deadline, with the expectation
of paying it back. The team knows about it. It may be legitimate (a strategic prototype,
a known temporary workaround during a migration).
**Accidental debt** — degradation that accumulated without a deliberate decision: the team
did not choose it and may not even know it exists. This is the kind Ward Cunningham's
original definition warned against — not a tactical trade-off, but structural erosion.
Mark each finding with `[intentional]` or `[accidental]` in the Debt Summary Table.
Intentional debt with no visible payback plan — no linked ticket, no code comment, no
documented decision — should be treated as accidental for prioritization purposes.
Focus remediation energy on accidental debt first; intentional debt at least has an owner.
### Step 4: Group by Decay Risk
Report findings grouped by risk type, not by file or module.
Grouping by risk reveals systemic patterns:
- "Change Propagation is systemic" → architectural intervention needed
- "Cognitive Overload is isolated" → localized refactoring sufficient
---
## Output
Use the standard Report Template from `../_shared/common.md`. Mode: Tech Debt Assessment.
After Findings, append a Debt Summary Table:
```
## Debt Summary
| Risk | Findings | Avg Priority | Classification | Intent |
|------|----------|-------------|----------------|--------|
| Cognitive Overload | N | X.X | Monitored/Scheduled/Critical | intentional/accidental |
| Change Propagation | N | X.X | ... | ... |
| Knowledge Duplication | N | X.X | ... | ... |
| Accidental Complexity | N | X.X | ... | ... |
| Dependency Disorder | N | X.X | ... | ... |
| Domain Model Distortion | N | X.X | ... | ... |
**Recommended focus:** [risks with highest average priority]
```

View File

@ -0,0 +1,37 @@
---
name: brooks-health
description: >
Combined codebase health dashboard that scores a project across all four quality
dimensions — PR quality, architecture, tech debt, and test quality — in a single
pass, drawing on twelve classic engineering books.
Triggers when: user wants an overall quality assessment, asks "how healthy is this
codebase?", "run all the checks", "give me a big-picture quality report", "I need a
health score before the release", "what's the overall state of our code?", or wants
to onboard a new team with a quality overview.
Do NOT trigger for: server health checks, HTTP health endpoints, Kubernetes
liveness/readiness probes, database health, or application uptime. Also do not
trigger when the user specifically requests only one dimension — use the
corresponding focused skill instead (brooks-review / brooks-audit /
brooks-debt / brooks-test).
---
# Brooks-Lint — Health Dashboard
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/decay-risks.md` for production risk symptom definitions
4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
5. Read `health-guide.md` in this directory for the dashboard orchestration process
## Process
**If the user has not specified a project or directory:** apply Auto Scope Detection
from `../_shared/common.md` to determine the review scope before proceeding.
1. Run abbreviated scans across all four dimensions (Step 1 of the guide)
2. Compute per-dimension and composite Health Scores with weighting (Step 2 of the guide)
3. Output the Health Dashboard using the dashboard report template (Step 3 of the guide)
**Mode line in report:** `Health Dashboard`

View File

@ -0,0 +1,89 @@
# Health Dashboard Guide — Mode 5
**Purpose:** Produce a cross-dimensional health dashboard for the codebase.
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
---
## Analysis Process
### Step 1: Run Lightweight Scan Across Four Dimensions
For each dimension, run an abbreviated scan using the decay-risks definitions
from `_shared/`. Do NOT read the individual mode guide files — use the abbreviated
checklists below. Cap each dimension at 3 findings; for Debt: cap at 2 per risk code and 3 across all risk codes.
**PR dimension (if changes exist):**
- Apply Auto Scope Detection (common.md)
- Scan for R2 (Change Propagation) and R1 (Cognitive Overload) in the diff
**Architecture dimension:**
- Gather codebase context: read top-level structure, entry points, import statements
- Draw a Mermaid dependency graph (follow standard graph rules from common.md)
- Scan for R5 (Dependency Disorder): circular deps, upward flows, fan-out > 5
- INCLUDE the Mermaid graph in output
**Debt dimension:**
- Scan for all six decay risks (R1-R6) across the codebase
- Skip Pain × Spread scoring (use severity tier only)
**Test dimension:**
- Build the Test Suite Map (unit/integration/E2E counts)
- Scan for T1 (Test Obscurity) and T2 (Test Brittleness) in test files
### Step 2: Compute Dashboard Scores
Each dimension gets its own Health Score (base 100, same deduction rules from common.md).
Composite score = weighted average of dimension scores:
| Dimension | Weight | Rationale |
|-----------|--------|-----------|
| PR (code quality) | 0.25 | Only applies if changes exist; skip if no diff |
| Architecture | 0.30 | Structural issues have highest blast radius |
| Debt | 0.25 | Systemic but slower-moving |
| Test | 0.20 | Supporting signal |
If PR dimension is skipped (no changes), redistribute its 0.25 weight proportionally
across the remaining three dimensions by dividing each remaining weight by
(1 0.25) = 0.75. Compute redistribution dynamically — do not hardcode the values.
**Redistributed weights (PR skipped):**
| Dimension | Base Weight | Redistributed Weight |
|-----------|------------|---------------------|
| Architecture | 0.30 | 0.30 / 0.75 = 0.40 |
| Debt | 0.25 | 0.25 / 0.75 = 0.33 |
| Test | 0.20 | 0.20 / 0.75 = 0.27 |
### Step 3: Output Dashboard
Use the dashboard report template below instead of the standard common.md template.
---
## Dashboard Report Template
````markdown
# Brooks-Lint Health Dashboard
**Mode:** Health Dashboard
**Scope:** [project name or directory]
**Composite Score:** XX/100
| Dimension | Score | Top Finding |
|-----------|-------|------------|
| Code Quality | XX/100 | [one-line summary or "Clean"] |
| Architecture | XX/100 | [one-line summary or "Clean"] |
| Tech Debt | XX/100 | [one-line summary or "Clean"] |
| Test Quality | XX/100 | [one-line summary or "Clean"] |
## Module Dependency Graph
[Mermaid graph from architecture scan]
## Top Findings (max 5 across all dimensions)
[Standard Iron Law format, sorted by severity]
## Recommendation
[One paragraph: what to fix first, which dimension needs the most attention,
suggest running the full individual skill for the worst dimension]
````

View File

@ -0,0 +1,37 @@
---
name: brooks-review
description: >
PR code review that surfaces decay risks, design smells, and maintainability
issues with concrete Symptom → Source → Consequence → Remedy findings, drawing
on twelve classic engineering books.
Triggers when: user asks to review code, check a PR, shares a diff or pastes
code asking "does this look right?" / "any issues here?" / "ready to merge?",
or asks for feedback on a function, class, or file.
Also triggers when user mentions: code smells / refactoring / clean architecture /
DDD / domain-driven design / SOLID principles / Hyrum's Law / deep modules /
tactical programming / conceptual integrity / Brooks's Law / Mythical Man-Month /
second system effect.
Do NOT trigger for: questions about how to write code from scratch, language syntax
questions, or framework/tool questions where no existing code is shared.
---
# Brooks-Lint — PR Review
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
4. Read `pr-review-guide.md` in this directory for the analysis process
## Process
**If the user has not specified files or pasted code:** apply Auto Scope Detection
from `../_shared/common.md` to determine the review scope before proceeding.
1. Understand the review scope, then scan for each decay risk in the order specified (Steps 16 of the guide)
2. Run the Quick Test Check (Step 7 of the guide) — skip for docs-only or non-production changes
3. Apply the Iron Law to every finding
4. Output using the Report Template from common.md
**Mode line in report:** `PR Review`

View File

@ -0,0 +1,163 @@
# PR Review Guide — Mode 1
**Purpose:** Analyze a code diff or specific files for decay risks that are directly visible
in the changed code. Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
---
## Before You Start
**Auto-generated files:** If the diff contains generated files (protobuf stubs, OpenAPI clients,
ORM migrations, lock files, minified bundles), skip those files entirely. Generated code reflects
tool choices, not developer decisions. Note in the report which files were skipped and why.
**Scope calibration:** Adjust analysis depth based on PR size before starting.
| PR Size | Approach |
|---------|----------|
| < 50 lines | Focus on Steps 13 only; run Step 6a only if imports changed; run Step 6b if any class, method, or variable was renamed or introduced |
| 50300 lines | Full process, all steps |
| > 300 lines | Full process; note in the Scope line that review is sampled — cover the highest-risk areas rather than every file |
For PRs > 500 lines: flag in the Summary that a PR this size is itself a Change Propagation signal. A change that cannot be reviewed in one pass suggests tangled responsibilities.
---
## Analysis Process
Work through these seven steps in order. Do not skip steps.
### Step 1: Understand the scope
Read the diff or files and answer:
- What is the stated purpose of this change?
- Which files were modified?
- Flag immediately if the PR changes more than 10 unrelated files — that itself is a
🟡 Warning: Change Propagation (a PR that touches many unrelated things is a sign
that responsibilities are tangled).
### Step 2: Scan for Change Propagation
*Scan this first — it is the most visible risk in a diff.*
Look for:
- Does this change touch files in modules that have no conceptual connection to the stated purpose?
- Does any modified class change for more than one business reason in this diff?
- Does any method use more data from another class than from its own?
If the diff shows no cross-module changes beyond what the feature requires → skip, no finding.
### Step 3: Scan for Cognitive Overload
Look for:
- Are any new or modified functions longer than 20 lines?
- Is there nesting deeper than 3 levels in new or modified code?
- Are there more than 4 parameters in any new function signature?
- Are there magic numbers or unexplained constants in new code?
- Do new variable or function names require reading the implementation to understand?
- Are there train-wreck chains (3+ method calls chained)?
### Step 4: Scan for Knowledge Duplication
Look for:
- Does this change introduce logic that already exists elsewhere in the codebase?
- Does this change introduce a new name for a concept that already has a name?
- Does this change add a class to a hierarchy that has a parallel in another module?
### Step 5: Scan for Accidental Complexity
Look for:
- Does this change add an abstraction with only one concrete use?
- Does this change add a class that only wraps another class or delegates everything?
- Does this change add configuration options or extension points that serve no current requirement?
### Step 6a: Scan for Dependency Disorder
- Do any new imports create a dependency from a high-level module to a low-level one?
(e.g., domain service now imports a database driver or HTTP client)
- Do any new imports introduce a cycle between modules?
- Does any new interface force callers to depend on methods they do not use?
If no new imports and no structural changes → skip, no finding.
### Step 6b: Scan for Domain Model Distortion
- Do new class or variable names match the language the business uses for the same concept?
- Does any new class hold only data with no behavior (pure data bag), where behavior was expected?
- Does any new method put logic that belongs to the domain in a service or utility layer?
---
## Severity Calibration
Apply the Iron Law format from `../_shared/common.md`. Each risk in `../_shared/decay-risks.md` has its own Severity
Guide with numeric thresholds — use those as the primary reference. When a finding sits
on the boundary between two tiers, use this as a tiebreaker:
- 🔴 Critical — actively breaking velocity or creating production risk *today*
- 🟡 Warning — will if left unaddressed through the next few features
- 🟢 Suggestion — worth fixing when nearby, not urgent
When multiple findings exist, list Critical items first. If there are more than 5 findings,
add a one-line "Recommended fix order" at the end of the Findings section.
---
## Step 7: Quick Test Check
*Run this last. Three signals only — this is not a full Mode 4 review.*
If the diff contains only generated files, configuration, or documentation with no
production logic changes → skip Step 7 entirely.
**Signal 1: Do tests exist for the changed behavior?**
- Does the diff modify production code?
- Are corresponding test file changes included in the diff?
- If new public behavior was added with no new tests:
→ 🟡 Warning: Coverage Illusion — new behavior is untested
→ Source: Feathers — Working Effectively with Legacy Code, Ch. 1
- If the change is a pure refactor and existing tests cover the behavior → no finding.
**Signal 2: Quick Mock Abuse sniff**
Only check if the diff includes test file changes.
- Is mock setup code in new/modified tests obviously longer than the test logic?
- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` with no behavior verification?
- Does the diff add any methods to production classes that are only called from test files?
If any of these are true:
→ 🟡 Warning: Mock Abuse — test complexity exceeds behavior complexity
→ Source: Osherove — The Art of Unit Testing, mock usage guidelines
**Signal 3: Quick Test Obscurity sniff**
Only check if the diff includes test file changes.
- Do new test names express scenario and expected outcome?
(Pattern: `methodName_scenario_expectedResult` or equivalent)
- Are there new tests with multiple assertions and no message strings on any of them?
If test names are vague or assertions lack messages:
→ 🟢 Suggestion: Test Obscurity — test intent is unclear from the test name or assertions
→ Source: Meszaros — xUnit Test Patterns, Assertion Roulette (p.224)
**Output rule:**
If all three signals are clean → write no Test findings. Proceed directly to the report.
If findings exist → add them to the Findings section using the standard Iron Law format.
Label the risk as the test decay risk name (e.g., "Coverage Illusion", "Mock Abuse",
"Test Obscurity").
> **Note:** Step 7 is a fast check, not a full test audit. When systemic test problems
> are found, note in the Summary: "Consider running `/brooks-lint:brooks-test` for a
> complete test quality diagnosis."
---
## Output
Use the standard Report Template from `../_shared/common.md`.
Mode: PR Review
Scope: list the files reviewed (excluding skipped generated files).

38
skills/thirdparty/brooks-sweep/SKILL.md vendored Normal file
View File

@ -0,0 +1,38 @@
---
name: brooks-sweep
description: >
Full-sweep mode: runs a unified analysis across all quality dimensions — code decay,
architecture, tech debt, and test quality — then applies fixes directly to the
codebase. Safe changes are auto-applied; risky changes are confirmed before
execution. Drawing on twelve classic engineering books.
Triggers when: user wants to "fix everything", "sweep the codebase", "auto-fix all
issues", "run all checks and fix them", "clean up the whole project", or asks for
a single command that both diagnoses and remediates quality problems.
Do NOT trigger for: read-only audits or health reports where the user only wants
findings without code changes; single-dimension reviews (use the focused skill
instead: brooks-review / brooks-audit / brooks-debt / brooks-test); server health
checks, HTTP /health endpoints, Kubernetes probes, or application uptime.
---
# Brooks-Lint — Full Sweep & Auto-Fix
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/decay-risks.md` for production risk symptom definitions
4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
5. Read `sweep-guide.md` in this directory for the unified scan and fix process
## Process
**If the user has not specified a project or directory:** apply Auto Scope Detection
from `../_shared/common.md` to determine the review scope before proceeding.
1. Show pre-flight consent notice and wait for the user's one-time approval (Step 0 of the guide)
2. Enumerate scope and initialize the `unresolvable` / `non_critical_rounds` / `fix_log` state (Step 1 of the guide)
3. Run the four dimensions in sequence — review, test, debt, audit — each scanning, classifying, applying Safe + Extended-Safe fixes, and verifying via the project test command (Steps 25 of the guide)
4. Iterate: re-scan modified files + same-module + static consumers; converge on a clean round, retire 3-retry failures to the `unresolvable` set, cap non-critical rounds at 3 (Step 6 of the guide)
5. Aggregate residual and unresolvable items and output the Full Sweep Report (Steps 78 of the guide)
**Mode line in report:** `Full Sweep`

View File

@ -0,0 +1,264 @@
# Brooks-Lint — Full Sweep Guide
Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings
in place, iterates until clean or capped, reports residuals. One interaction point:
Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8.
Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**.
---
### Step 0 — Pre-flight consent gate
**Goal:** State scope, cost, and irreversibility up front; get explicit consent
once so later steps never have to ask.
0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or
`find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough.
0b. Show this notice verbatim with the estimate filled in. Do not paraphrase —
the user is agreeing to this exact scope.
```
⚠️ /brooks-sweep — Full Repository Sweep & Auto-Fix
Scope: Four analysis dimensions run in sequence — PR code decay (R1R6),
test quality (T1T6), tech debt, architecture. Edits are made in
place inside the detected project scope.
Estimated files in scope: ~N
Order: brooks-review → brooks-test → brooks-debt → brooks-audit.
Each dimension scans, queues, and fixes before the next starts.
Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file
fixes that have test coverage AND do not break a public interface
also apply directly. High-risk fixes (public API break, cross-module
structural change, or no test coverage) are NOT applied — they are
recorded in the residual report for human review.
Iteration: After each dimension pass, modified files + same-module + static
consumers are re-scanned. A finding that fails to fix 3 times is
retired to the unresolvable set and never re-queued. Non-critical
rounds cap at 3 iterations; critical findings iterate until
resolved or retired.
Git impact: The pipeline edits files. It does NOT commit, push, or amend.
If you have uncommitted work you want to preserve, commit or stash
first.
Proceed with full autonomous sweep? [Y/n]
```
0c. Parse the reply (first match wins, evaluate rules in order):
1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified."
2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1.
3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1.
4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification."
0d. After consent, do not ask further questions until Step 8.
---
### Step 1 — Scope enumeration and state init
1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not
specify files or a directory. Otherwise honor the user's explicit scope.
1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`,
`severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the
applied config values and reuse them across all iteration rounds — do not
re-read the file in Step 6 even if files were modified.
1c. Initialize pipeline state (persists across all rounds):
- **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued.
- **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round.
- **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`).
1d. Record the final scope file list in the Fix Report output buffer for Step 8.
---
### Step 2 — brooks-review pass (R1R6 code decay)
Scan every file in scope against all R-series risks defined in
`../_shared/decay-risks.md`.
2a. For each R-risk, apply its symptom checklist. Record each hit as a finding
with: risk code, file + approximate line range, Symptom, Source,
Consequence, Remedy, Severity (Critical / Warning / Suggestion), and
**Fix-Class** (see Step 2b).
2b. Assign Fix-Class per finding:
| Class | Criteria |
|-------|----------|
| **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. |
| **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. |
| **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. |
2c. Skip any finding that matches an entry in the `unresolvable` set.
2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk
within each severity tier first. For each fix: Edit or Write, then append
one row to `fix_log` with outcome `applied`. If two fixes touch overlapping
line ranges in the same file, apply higher-severity first, re-read the file,
then apply the next.
2e. After all fixes in this dimension, run the project test/lint command if one
exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.).
If tests fail: revert fixes from this dimension in reverse order one at a
time, re-running the test command after each revert, until tests pass.
Mark each reverted fix with outcome `reverted` in `fix_log` and promote the
finding to **Residual**. If no test command is found, note this once in the
report and continue.
2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied,
R reverted, P Residual.
---
### Step 3 — brooks-test pass (T1T6 test decay)
Scan test files (and untested production code) against T-series risks defined
in `../_shared/test-decay-risks.md`.
Follow the same sub-steps as Step 2 (classify → apply → verify → summarize),
using T-prefix risk codes. For production files with no test coverage at all,
record as T2 (Missing Tests). A test scaffold that adds a pure-function test is
**Safe**; adding tests that require new test infrastructure is **Residual**.
---
### Step 4 — brooks-debt pass (tech debt accumulation)
Re-classify R-findings through a debt lens — same symptoms at accumulation scale:
repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead
flags. Score each with **Pain (13) × Spread (13)**; total 79 = Critical,
46 = Warning, 13 = Suggestion. Apply a severity bump for pattern-level
occurrences (isolated Suggestion → 4+ modules Warning).
Follow the same sub-steps as Step 2. Debt findings often span multiple files
and are more likely to land in Extended-Safe or Residual than Safe.
---
### Step 5 — brooks-audit pass (architecture integrity)
Scan the full scope for architecture-level issues. The dependency-direction
symptoms (inverted dependencies, circular imports, cross-domain coupling) are
defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5
additionally covers architecture-only concerns that R5 does not: missing
abstraction layers, god modules, leaked infrastructure inside domain code,
and seam-boundary violations.
Most architecture findings are **Residual** by definition — they require human
judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared
constant used in 3+ modules into a new module that nothing else imports yet).
Do not auto-refactor module layouts, rename packages, or change public exports.
Follow the same sub-steps as Step 2.
---
### Step 6 — Iteration loop
**Goal:** Re-scan what the fixes touched and converge. Stop on clean round,
cap, or no progress.
6a. Build the re-scan scope:
- every file modified in Steps 25 of the current round, PLUS
- every file in the same module as a modified file, PLUS
- every file that statically imports from a modified file.
Do not re-scan files whose dependencies were not touched. On monorepos
where a "module" may span hundreds of files, narrow the same-module bucket
to files that import from or are imported by a modified file (direct
dependency graph only).
6b. Re-run Steps 25 on the re-scan scope. For each new finding in this round:
- If it matches an entry in `unresolvable` → skip.
- Else if 🔴 Critical → queue and fix; Critical findings iterate until
resolved OR retired (3 failed attempts → `unresolvable`).
- Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below.
6c. Classify the round after all fixes attempted:
- **Clean round** (no new findings outside `unresolvable`): pipeline
converged → proceed to Step 7.
- **Critical-only round**: do NOT increment `non_critical_rounds`; return
to 6a.
- **Mixed or non-critical round** (any Warning / Suggestion produced):
increment `non_critical_rounds` by 1. If it reaches the cap (default 3,
or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7
with remaining non-critical findings recorded as
`"Unresolved — iteration cap reached"`. Otherwise return to 6a.
6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times
across any combination of rounds, retire it to `unresolvable` with reason
`"3-retry budget exhausted"` and stop attempting it.
---
### Step 7 — Residual aggregation
Collect everything that was NOT fixed in place, de-duplicated:
- All Residual-class findings from Steps 25 (first round + re-scan rounds)
- All `unresolvable` entries with their retirement reason
- All iteration-cap residuals from Step 6c
Sort Critical → Warning → Suggestion. Within each severity, list file path,
risk code, Symptom (one line), Remedy (one line), and the reason it was not
applied (`public API break` / `no test coverage` / `3-retry budget` /
`iteration cap`).
---
### Step 8 — Sweep report
Output the final report. Use the standard Report Template from
`../_shared/common.md` with these additions:
```
# Brooks-Lint — Full Sweep Report
Mode: Full Sweep | Scope: <files or directory>
Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored) # omit if no config
## Dimension Summary
| Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual |
|-----------|---------|--------------|------------------|----------|----------|
| Review (R1R6) | ... | ... | ... | ... | ... |
| Test (T1T6) | ... | ... | ... | ... | ... |
| Debt | ... | ... | ... | ... | ... |
| Audit | ... | ... | ... | ... | ... |
## Iteration History
Round 1: <classification clean / critical-only / mixed>, <N> new findings
Round 2: ...
Stopped at: clean round | iteration cap | no outstanding criticals
## Fix Log
| # | File | Lines | Risk | Outcome | Change |
|---|------|-------|------|----------|--------|
| 1 | ... | ... | R2 | applied | Extract repeated constant |
| 2 | ... | ... | T4 | reverted | Test regression; promoted to Residual |
...
## Health Score Delta
Before: <estimated score>/100 → After: <estimated score>/100
(Re-run /brooks-health for an exact recalculation.)
## Residual Items (<K> not applied)
<Iron Law entries, sorted Critical Suggestion, with "Not applied because: ..." line>
## Summary
- Total findings detected: <N>
- Fixed this sweep: <M>
- Residual (needs human review): <K>
- Unresolvable (3-retry exhausted): <U>
```
If there are zero residual items and zero unresolvable entries, end with:
**"Sweep complete — codebase is clean."**
**Mode line in report:** `Full Sweep`

36
skills/thirdparty/brooks-test/SKILL.md vendored Normal file
View File

@ -0,0 +1,36 @@
---
name: brooks-test
description: >
Test quality review drawing on twelve classic engineering books — with primary focus
on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and
Working Effectively with Legacy Code — that diagnoses structural problems in an
existing test suite: brittleness, mock abuse, coverage illusions, slow execution,
poor readability.
Triggers when: user asks about test quality, shares test files for review, or
expresses frustration: "tests keep breaking whenever I change anything", "our tests
take forever", "I can't understand what this test is doing", "tests pass but bugs
still reach production", "we have too many mocks".
Do NOT trigger for: writing new tests from scratch (use the regular test-writing
workflow) or testing framework/syntax questions — this skill reviews an existing
suite for structural quality problems, not individual test authoring.
---
# Brooks-Lint — Test Quality Review
## Setup
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions
4. Read `test-guide.md` in this directory for the test quality review framework
## Process
**If the user has not shared test files or pointed to a test directory:** apply Auto
Scope Detection from `../_shared/common.md` to determine the review scope before proceeding.
1. Build the test suite map (guide's "Before You Start" section)
2. Scan for each test decay risk in the order specified (Steps 14 of the guide)
3. Apply the Iron Law and output using the Report Template (Step 5 of the guide)
**Mode line in report:** `Test Quality Review`

View File

@ -0,0 +1,147 @@
# Test Quality Review Guide — Mode 4
**Purpose:** Diagnose the health of a test suite using six test-space decay risks.
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
---
## Before You Start: Build the Test Suite Map
Before scanning for any risk, map the current test suite structure:
```
Unit tests: X files, ~N tests
Integration tests: X files, ~N tests
E2E tests: X files, ~N tests
Ratio: Unit X% : Integration X% : E2E X%
Coverage areas: [modules with tests] vs [modules without tests]
```
If you cannot access test files directly, ask the user **one question** — choose the
most relevant:
1. "Which module is hardest to test or has the least coverage?"
2. "When you make a change, how often do unrelated tests break?"
3. "Is there a part of the codebase your team avoids touching because it has no tests?"
After one answer, proceed. Do not ask more than one question.
---
## Analysis Process
Work through these five steps in order.
### Step 1: Scan for Test Obscurity
*Scan this first — the most visible risk and the one that determines whether the suite
is maintainable at all.*
Look for:
- Read 510 test names at random: can each one communicate subject + scenario + expected
outcome without opening the test body?
- Are there tests where a failure gives no clue which behavior broke (multiple assertions,
no message strings)?
- Does any test depend on external state (files, database rows, env variables, shared mutable
fixtures) that is invisible from within the test body?
- Is there a single massive setUp or beforeEach that every test inherits regardless of
what it actually needs?
If all test names are clear and setups are minimal → no finding.
### Step 2a: Scan for Test Brittleness
*Brittle tests break on refactors that do not change observable behavior — they test
implementation, not contracts.*
Look for:
- Ask (or check git history): did any recent refactor cause test failures with no
behavior change?
- Are there test methods where the name contains "and" or that assert on 3 or more
unrelated behaviors (Eager Test)?
- Do assertions specify mock call order or exact parameter values that are irrelevant
to the observable behavior?
- Are tests coupled to private methods or internal state directly?
If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
If isolated (12 brittle tests) → 🟢 Suggestion.
### Step 2b: Scan for Mock Abuse
*Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
Scan this separately from brittleness — over-mocking is often the cause of brittleness,
but it is a distinct problem worth its own finding.*
**Sample 35 tests once for both steps 2a and 2b together** — read each test body and
check brittleness signals and mock-setup ratio in the same pass, then write separate
findings if both problems are present.
Look for:
- Is mock setup code longer than the assertion logic in the sampled tests?
- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
assertions on outputs, state, or observable events?
- Are there methods in production classes that are only called from test files
(test-induced design damage)?
- Does any single test create more than 3 mock objects?
If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
If production methods exist only for test access → 🔴 Critical (architecture is being
distorted by the test suite).
### Step 3: Scan for Test Duplication
Look for:
- Is the same setup block (same variables initialized the same way) repeated across
5 or more test files without a shared helper?
- Are there multiple tests that pass identical inputs and assert identical outputs
with no differentiation (Lazy Test)?
- Is the same business scenario covered at unit, integration, and E2E level with no
difference in what each layer is testing?
If duplication is systemic (10 or more instances) → Critical.
If localized (35 instances) → Warning.
### Step 4: Scan for Coverage Illusion and Architecture Mismatch
Look for Coverage Illusion:
- Pick the most recently modified core module. Are its error-handling branches and
null/boundary inputs covered by tests?
- Are there legacy areas (old functions, no test files nearby) that are actively
being changed?
- Do the tests assert on side effects (DB writes, events emitted, state transitions)
or only on return values?
**Characterization Test check:** If legacy code is being modified without existing tests,
the team needs Characterization Tests before making the change — not after. Look for
this pattern and flag it when absent.
A Characterization Test locks in current behavior (right or wrong) so future changes
do not silently regress it. Template:
```
test("characterize: [module].[method] given [input], returns [current output]") {
// Call the code under test with realistic inputs
// Assert on whatever it currently returns — even if you suspect the output is wrong
// Add a comment: "This captures current behavior, not necessarily correct behavior"
}
```
Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests
Look for Architecture Mismatch:
- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
- Are high-risk modules tested at higher density than trivial utilities?
**Test suite performance:** A slow test suite is a first-class maintainability risk — it
breaks the fast-feedback loop and causes developers to skip running tests locally.
- If the full suite runtime is known and > 10 minutes → 🟡 Warning
- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
means nobody is running it regularly)
- If tests that could be unit tests are integration tests, that is a Performance Mismatch:
each misclassified test adds seconds of avoidable wait time
Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)
### Step 5: Apply Iron Law, Output Report
Apply the Iron Law format from `../_shared/common.md` to each finding.
Use the standard Report Template. Mode: Test Quality Review.
Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".

View File

@ -0,0 +1,128 @@
---
name: codebase-migrate
description: Run large codebase migrations and multi-file refactors. Uses the Composio CLI to coordinate issue tracking, batched PRs, and CI verification while the agent executes the transforms locally across hundreds of files.
metadata:
short-description: Codebase migrations + multi-file refactors
---
# Codebase Migrate
Coordinate framework upgrades, API renames, config rewrites, and structural refactors across hundreds of files. Local edits are driven by the agent; the [Composio CLI](https://docs.composio.dev/docs/cli) handles the surrounding ceremony: tracking issues, per-batch PRs, and CI verification.
## When to Use
- Framework upgrade (React 17 → 19, Node 18 → 22, Django 4 → 5).
- API rename across a monorepo (e.g., `getUserById``users.byId`).
- Config/format migration (webpack → vite, eslint → biome, jest → vitest).
- Any "change 200 files the same way" task that needs to ship in reviewable slices.
## Prereqs
```bash
curl -fsSL https://composio.dev/install | bash
composio login
composio link github # for PRs + CI status
composio link linear # or jira — for migration tracking
```
Local tools the agent will use directly: `git`, `rg`, `jscodeshift`/`ts-morph`/`comby`/`ast-grep` (language-appropriate), and your test runner.
## Planning Phase
1. **Define the transform precisely.** Bad: "migrate to vitest." Good: "replace `jest.mock` with `vi.mock`, swap `jest.fn()` for `vi.fn()`, rename `jest.config.js``vitest.config.ts` using template X."
2. **Scope the blast radius:**
```bash
rg -l 'jest\.(mock|fn|spyOn)' | wc -l
rg -l 'from "jest"' | sort
```
3. **File a tracking issue:**
```bash
composio execute LINEAR_CREATE_ISSUE -d '{
"teamId":"TEAM_ID",
"title":"Migrate test runner: jest → vitest",
"description":"Batches of ~25 files. Checkpoint after each PR lands green."
}'
```
## Execute in Reviewable Batches
Loop: pick N files → transform → test → PR → wait for green → merge → next batch.
```bash
# Batch helper: first 25 untouched files matching the pattern
BATCH=$(rg -l 'jest\.mock' | grep -v done.list | head -25)
echo "$BATCH" > batch.list
```
The agent runs the codemod on `batch.list`, then:
```bash
git checkout -b migrate/vitest-batch-03
xargs < batch.list codemod-runner # e.g. jscodeshift / ts-morph / comby
npm test -- --changed
git add -A && git commit -m "migrate(test): jest → vitest (batch 3)"
git push -u origin migrate/vitest-batch-03
composio execute GITHUB_CREATE_A_PULL_REQUEST -d '{
"owner":"acme","repo":"app",
"head":"migrate/vitest-batch-03","base":"main",
"title":"migrate(test): jest → vitest (batch 3)",
"body":"Part of LIN-482. 25 files. Codemod: `transforms/jest-to-vitest.ts`."
}'
```
Then poll CI and merge when green:
```bash
composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
-d '{"owner":"acme","repo":"app","branch":"migrate/vitest-batch-03"}'
```
## Workflow Script
`scripts/migrate-batch.ts`, run per batch via `composio run --file scripts/migrate-batch.ts -- --batch 3`:
```ts
const batch = process.argv[process.argv.indexOf("--batch") + 1];
const pr = await execute("GITHUB_CREATE_A_PULL_REQUEST", {
owner: "acme", repo: "app",
head: `migrate/vitest-batch-${batch}`, base: "main",
title: `migrate(test): jest → vitest (batch ${batch})`,
body: `Part of LIN-482. See transforms/jest-to-vitest.ts.`
});
await execute("LINEAR_CREATE_COMMENT", {
issueId: "LIN-482",
body: `Opened PR #${pr.number}: ${pr.html_url}`
});
```
## Safety Rails
- **One transform per PR.** Never mix a rename with a format change.
- **Keep a `done.list`** of files already migrated so the next batch skips them.
- **Run the full test suite on the last batch**, even if per-batch PRs ran `--changed`.
- **Codemod first, hand-edit second.** If the codemod misses 3 files, patch them manually and note it in the PR body.
- **Roll back per-batch**, not globally. Each PR should revert cleanly.
## Verification Loop
After each merge:
```bash
rg 'jest\.(mock|fn|spyOn)' | wc -l # should trend to 0
npm test # full suite
composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
-d '{"owner":"acme","repo":"app","branch":"main","event":"push"}' \
| jq '.workflow_runs[0].conclusion'
```
## Troubleshooting
- **Codemod regex catches too much** → switch to AST-based tooling (`ast-grep`, `ts-morph`) for structural matches.
- **Tests pass locally, CI fails** → pin Node/Python version parity; check `.nvmrc` / `pyproject.toml`.
- **PR too big to review** → cut batch size in half; maintainers won't review 800-line diffs.
- **Conflicts between batches** → rebase the open batch before merging the current one; never force-push merged batches.
Full CLI reference: [docs.composio.dev/docs/cli](https://docs.composio.dev/docs/cli)

View File

@ -0,0 +1,266 @@
---
name: codebase-recon
description: This skill should be used when analyzing codebases, understanding architecture, or when "analyze", "investigate", "explore code", or "understand architecture" are mentioned.
metadata:
version: "1.0.0"
---
# Codebase Analysis
Evidence-based investigation → findings → confidence-tracked conclusions.
## Steps
1. Gather evidence from multiple sources (code, docs, tests, history)
2. Track confidence level as investigation progresses
3. Based on findings:
- If pattern analysis needed → load the `outfitter:patterns` skill
- If root cause investigation → load the `outfitter:find-root-causes` skill
- If ready to report → load the `outfitter:report-findings` skill
4. Deliver findings with confidence level and caveats
<when_to_use>
- Codebase exploration and understanding
- Architecture analysis and mapping
- Pattern extraction and recognition
- Technical research within code
- Performance or security analysis
NOT for: wild guessing, assumptions without evidence, conclusions before investigation
</when_to_use>
<confidence>
| Bar | Lvl | Name | Action |
|-----|-----|------|--------|
| `░░░░░` | 0 | Gathering | Collect initial evidence |
| `▓░░░░` | 1 | Surveying | Broad scan, surface patterns |
| `▓▓░░░` | 2 | Investigating | Deep dive, verify patterns |
| `▓▓▓░░` | 3 | Analyzing | Cross-reference, fill gaps |
| `▓▓▓▓░` | 4 | Synthesizing | Connect findings, high confidence |
| `▓▓▓▓▓` | 5 | Concluded | Deliver findings |
*Calibration: 0=019%, 1=2039%, 2=4059%, 3=6074%, 4=7589%, 5=90100%*
Start honest. Clear codebase + focused question → level 23. Vague or complex → level 01.
At level 4: "High confidence in findings. One more angle would reach full certainty. Continue or deliver now?"
Below level 5: include `△ Caveats` section.
</confidence>
<principles>
## Core Methodology
**Evidence over assumption** — investigate when you can, guess only when you must.
**Multi-source gathering** — code, docs, tests, history, web research, runtime behavior.
**Multiple angles** — examine from different perspectives before concluding.
**Document gaps** — flag uncertainty with △, track what's unknown.
**Show your work** — findings include supporting evidence, not just conclusions.
**Calibrate confidence** — distinguish fact from inference from assumption.
</principles>
<evidence_gathering>
## Source Priority
1. **Direct observation** — read code, run searches, examine files
2. **Documentation** — official docs, inline comments, ADRs
3. **Tests** — reveal intended behavior and edge cases
4. **History** — git log, commit messages, PR discussions
5. **External research** — library docs, Stack Overflow, RFCs
6. **Inference** — logical deduction from available evidence
7. **Assumption** — clearly flagged when other sources unavailable
## Investigation Patterns
**Start broad, then narrow:**
- File tree → identify relevant areas
- Search patterns → locate specific code
- Code structure → understand without full content
- Read targeted files → examine implementation
- Cross-reference → verify understanding
**Layer evidence:**
- What does the code do? (direct observation)
- Why was it written this way? (history, comments)
- How does it fit the system? (architecture, dependencies)
- What are the edge cases? (tests, error handling)
**Follow the trail:**
- Function calls → trace execution paths
- Imports/exports → map dependencies
- Test files → understand usage patterns
- Error messages → reveal assumptions
- Comments → capture historical context
</evidence_gathering>
<output_format>
## During Investigation
After each evidence-gathering step emit:
- **Confidence:** {BAR} {NAME}
- **Found:** { key discoveries }
- **Patterns:** { emerging themes }
- **Gaps:** { what's still unclear }
- **Next:** { investigation direction }
## At Delivery (Level 5)
### Findings
{ numbered list of discoveries with supporting evidence }
1. {FINDING} — evidence: {SOURCE}
2. {FINDING} — evidence: {SOURCE}
### Patterns
{ recurring themes or structures identified }
### Implications
{ what findings mean for the question at hand }
### Confidence Assessment
Overall: {BAR} {PERCENTAGE}%
High confidence areas:
- {AREA} — {REASON}
Lower confidence areas:
- {AREA} — {REASON}
### Supporting Evidence
- Code: { file paths and line ranges }
- Docs: { references }
- Tests: { relevant test files }
- History: { commit SHAs if relevant }
- External: { URLs if applicable }
## Below Level 5
### △ Caveats
**Assumptions:**
- {ASSUMPTION} — { why necessary, impact if wrong }
**Gaps:**
- {GAP} — { what's missing, how to fill }
**Unknowns:**
- {UNKNOWN} — { noted for future investigation }
</output_format>
<specialized_techniques>
Load skills for specialized analysis (see Steps section):
- **Pattern analysis**`outfitter:patterns`
- **Root cause investigation**`outfitter:find-root-causes`
- **Research synthesis**`outfitter:report-findings`
- **Architecture analysis** → see [architecture-analysis.md](references/architecture-analysis.md)
</specialized_techniques>
<workflow>
Loop: Gather → Analyze → Update Confidence → Next step
1. **Calibrate starting confidence** — what do we already know?
2. **Identify evidence sources** — where can we look?
3. **Gather systematically** — collect from multiple angles
4. **Cross-reference findings** — verify patterns hold
5. **Flag uncertainties** — mark gaps with △
6. **Synthesize conclusions** — connect evidence to insights
7. **Deliver with confidence level** — clear about certainty
At each step:
- Document what you found (evidence)
- Note what it means (interpretation)
- Track what's still unclear (gaps)
- Update confidence bar
</workflow>
<validation>
Before concluding (level 4+):
**Check evidence quality:**
- ✓ Multiple sources confirm pattern?
- ✓ Direct observation vs inference clearly marked?
- ✓ Assumptions explicitly flagged?
- ✓ Counter-examples considered?
**Check completeness:**
- ✓ Original question fully addressed?
- ✓ Edge cases explored?
- ✓ Alternative explanations ruled out?
- ✓ Known unknowns documented?
**Check deliverable:**
- ✓ Findings supported by evidence?
- ✓ Confidence calibrated honestly?
- ✓ Caveats section included if <100%?
- ✓ Next steps clear if incomplete?
</validation>
<rules>
ALWAYS:
- Investigate before concluding
- Cite evidence sources with file paths/URLs
- Use confidence bars to track certainty
- Flag assumptions and gaps with △
- Cross-reference from multiple angles
- Document investigation trail
- Distinguish fact from inference
- Include caveats below level 5
NEVER:
- Guess when you can investigate
- State assumptions as facts
- Conclude from single source
- Hide uncertainty or gaps
- Skip validation checks
- Deliver without confidence assessment
- Conflate evidence with interpretation
</rules>
<references>
Core methodology:
- [confidence.md](../pathfinding/references/confidence.md) — confidence calibration (shared with pathfinding)
Micro-skills (load as needed):
- `outfitter:patterns` — extracting and validating patterns
- `outfitter:find-root-causes` — systematic problem diagnosis
- `outfitter:report-findings` — multi-source research synthesis
Local references:
- [architecture-analysis.md](references/architecture-analysis.md) — system structure mapping
Related skills:
- `outfitter:pathfinding` — clarifying requirements before analysis
- `outfitter:debugging` — structured bug investigation
</references>

View File

@ -0,0 +1,220 @@
# Architecture Analysis
Techniques for analyzing system structure, dependencies, and component relationships.
## Dependency Mapping
### Forward Dependencies
What a component relies on:
1. **Direct imports** — explicit dependencies in code
2. **Indirect references** — called through interfaces
3. **Runtime dependencies** — configuration, environment
4. **Data dependencies** — shared state, databases
### Reverse Dependencies
What relies on this component:
1. **Direct dependents** — explicit imports from other modules
2. **Interface consumers** — components using this API
3. **Side effect consumers** — code relying on mutations
4. **Event subscribers** — listeners for this component's events
### Circular Dependencies
Red flags:
- A imports B, B imports A
- Longer cycles: A → B → C → A
- Implicit cycles through shared state
Resolution strategies:
- Extract shared code to separate module
- Introduce interface/abstraction layer
- Invert dependency direction
- Break into smaller components
## Layer Identification
### Detecting Layers
Look for:
- **Directional flow** — data/control flows one way
- **Abstraction levels** — concrete → abstract as you ascend
- **Responsibility clustering** — similar concerns grouped
- **Interface boundaries** — clear contracts between groups
### Common Layer Patterns
**Three-tier**:
- Presentation (UI, API endpoints)
- Business logic (domain, workflows)
- Data access (repositories, queries)
**Hexagonal/Clean**:
- Core domain (entities, business rules)
- Application layer (use cases, orchestration)
- Infrastructure (frameworks, external services)
- Interfaces (controllers, adapters)
**Microservices**:
- Service boundary (API gateway)
- Service logic (domain per service)
- Data layer (per-service database)
- Cross-cutting (auth, logging, monitoring)
### Layer Violations
Violations indicate architectural drift:
- Lower layer imports higher layer
- Business logic in presentation layer
- Data access code in domain entities
- Infrastructure concerns leaking into core
## Interface Analysis
### Contract Definition
Examine:
- **Input types** — what does it accept?
- **Output types** — what does it return?
- **Error modes** — what can fail, how?
- **Side effects** — mutations, I/O, state changes
- **Invariants** — what must always be true?
### API Quality
Strong interfaces show:
- **Cohesion** — methods belong together
- **Minimal surface** — small, focused API
- **Clear contracts** — types tell the story
- **Stability** — changes don't cascade
- **Composability** — works well with others
Weak interfaces show:
- **Kitchen sink** — unrelated methods bundled
- **Leaky abstractions** — implementation details exposed
- **Unstable** — frequent breaking changes
- **Rigid** — hard to extend or compose
## Component Relationships
### Relationship Types
**Composition**:
- Component owns sub-components
- Lifecycles coupled
- Strong cohesion
- Example: `Page` owns `Header`, `Footer`
**Aggregation**:
- Component references others
- Independent lifecycles
- Loose coupling
- Example: `ShoppingCart` references `Product`
**Dependency**:
- Uses another component's interface
- No ownership
- Can be swapped
- Example: `AuthService` uses `Database`
**Association**:
- Knows about but doesn't own
- Weak relationship
- Often bidirectional
- Example: `User``Post` (many-to-many)
### Coupling Analysis
**Low coupling** (good):
- Communicate through interfaces
- Few shared assumptions
- Changes localized
- Easy to test in isolation
**High coupling** (risky):
- Direct field access
- Shared mutable state
- Knowledge of implementation
- Changes ripple widely
## Architectural Pattern Recognition
### Layered Architecture
Indicators:
- Unidirectional dependencies (top → bottom)
- Each layer uses only layer below
- Clear separation of concerns
Trade-offs:
- ✓ Simple, well-understood
- ✓ Easy to enforce rules
- ✗ Can become rigid
- ✗ Performance overhead
### Event-Driven Architecture
Indicators:
- Pub/sub or message queues
- Decoupled components
- Asynchronous communication
- Event sourcing patterns
Trade-offs:
- ✓ Scalable, resilient
- ✓ Loose coupling
- ✗ Harder to reason about flow
- ✗ Eventual consistency challenges
### Microservices
Indicators:
- Service per bounded context
- Independent deployment
- API-based communication
- Decentralized data
Trade-offs:
- ✓ Independent scaling
- ✓ Technology diversity
- ✗ Distributed system complexity
- ✗ Operational overhead
## Analysis Workflow
### Top-Down
Start broad, narrow focus:
1. **System boundaries** — what's in scope?
2. **Major components** — high-level modules
3. **Component interactions** — how they communicate
4. **Internal structure** — zoom into each component
5. **Implementation** — code-level details
### Bottom-Up
Start specific, build understanding:
1. **Entry point** — main(), server start, UI root
2. **Call graph** — trace execution paths
3. **Cluster calls** — group related functionality
4. **Extract components** — identify logical boundaries
5. **Map relationships** — connect the pieces
### Targeted
Focus on specific concern:
1. **Define question** — what are you trying to understand?
2. **Identify relevant code** — where does this happen?
3. **Trace dependencies** — what does it touch?
4. **Analyze impact** — what would changing this affect?
5. **Document findings** — capture insights
## Documentation Extraction
From architecture analysis, capture:
- **Component diagram** — boxes and arrows
- **Dependency graph** — what imports what
- **Layer diagram** — abstraction levels
- **Sequence diagrams** — interaction flows
- **Decision records** — why this structure?