📦 deps(skills): sync thirdparty skills
This commit is contained in:
parent
b7eb57f5da
commit
b4c88b32be
|
|
@ -0,0 +1,7 @@
|
|||
_shared
|
||||
brooks-audit
|
||||
brooks-debt
|
||||
brooks-health
|
||||
brooks-review
|
||||
brooks-sweep
|
||||
brooks-test
|
||||
|
|
@ -0,0 +1 @@
|
|||
codebase-migrate
|
||||
|
|
@ -0,0 +1 @@
|
|||
codebase-recon
|
||||
|
|
@ -0,0 +1,240 @@
|
|||
# Brooks-Lint — Shared Framework
|
||||
|
||||
Code and test quality diagnosis using principles from twelve classic software engineering books.
|
||||
Use `source-coverage.md` to keep those sources grounded in real evidence, exceptions, and tradeoffs.
|
||||
|
||||
## The Iron Law
|
||||
|
||||
```
|
||||
NEVER suggest fixes before completing risk diagnosis.
|
||||
EVERY finding must follow: Symptom → Source → Consequence → Remedy.
|
||||
```
|
||||
|
||||
Violating this law produces reviews that list rule violations without explaining why they
|
||||
matter. A finding without a consequence and a remedy is not a finding — it is noise.
|
||||
|
||||
> **On-demand sections (skip unless the condition applies):**
|
||||
> - "Remedy Mode" — only when user passes `--fix` or asks to fix findings
|
||||
> - "Post-Report Triage" — only in interactive sessions after the report is output
|
||||
> - "History Tracking" — only after the Health Score is computed
|
||||
|
||||
## Project Config
|
||||
|
||||
Before executing the review, attempt to read `.brooks-lint.yaml` from the project root.
|
||||
If the file exists, parse and apply its settings before proceeding.
|
||||
If the file does not exist, continue with defaults (all risks enabled, no ignores).
|
||||
|
||||
In a multi-mode session, re-read only if the user says the config has changed.
|
||||
|
||||
### Supported settings
|
||||
|
||||
**`disable`** — list of risk codes to skip entirely. Findings for disabled risks are
|
||||
silently omitted from the report and do not affect the Health Score.
|
||||
Valid codes: `R1` `R2` `R3` `R4` `R5` `R6` `T1` `T2` `T3` `T4` `T5` `T6`
|
||||
|
||||
**`severity`** — override the severity of a specific risk for this project.
|
||||
Valid values: `critical` `warning` `suggestion`
|
||||
Example: `R1: suggestion` means every R1 finding is downgraded to Suggestion regardless
|
||||
of what the guide says.
|
||||
|
||||
**`ignore`** — list of glob patterns. Files matching any pattern are excluded from
|
||||
analysis. Findings that arise solely from ignored files are omitted.
|
||||
Common entries: `**/*.generated.*`, `**/vendor/**`, `**/migrations/**`
|
||||
|
||||
**`focus`** — non-empty list of risk codes to evaluate; all others are skipped.
|
||||
Omit this key (or leave it empty) to evaluate all non-disabled risks.
|
||||
Cannot be combined with a non-empty `disable` list.
|
||||
|
||||
**Minimal example:**
|
||||
```yaml
|
||||
version: 1
|
||||
disable:
|
||||
- T5
|
||||
severity:
|
||||
R1: suggestion
|
||||
ignore:
|
||||
- "**/*.generated.*"
|
||||
```
|
||||
|
||||
If `.brooks-lint.yaml` contains a `custom_risks` map, read `custom-risks-guide.md`
|
||||
from the `_shared/` directory for loading and scanning instructions.
|
||||
|
||||
### Config Validation
|
||||
|
||||
Before applying, check for errors and mention each in the report:
|
||||
- Invalid risk code (not R1–R6, T1–T6, or a defined `Cx` code): skip it, note `"Config warning: X is not a valid risk code"`
|
||||
- Invalid severity value (not `critical`/`warning`/`suggestion`): skip it, note the error
|
||||
- Both `disable` and `focus` are non-empty: treat as a config error, ignore both, note it
|
||||
|
||||
If the YAML fails to parse entirely, skip config loading and proceed with defaults.
|
||||
|
||||
### Config Reporting
|
||||
|
||||
If a config file was found and applied, add this line immediately after the **Scope** line
|
||||
in the report:
|
||||
`Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)`
|
||||
|
||||
Include N and M even if zero. Omit this line if no config file was found.
|
||||
|
||||
---
|
||||
|
||||
## Auto Scope Detection
|
||||
|
||||
When no files or code are specified, detect scope automatically:
|
||||
|
||||
**PR Review:** `git diff --cached` → `git diff` → `git diff main...HEAD` → ask user.
|
||||
|
||||
**Architecture Audit / Tech Debt:** Entire project by default. `--since=<ref>`: run `git diff <ref>...HEAD --name-only`, analyze only modules containing changed files; note "Incremental audit — modules touched since <ref>".
|
||||
|
||||
**Test Quality:** All test files by default. If a diff exists, prioritize test files co-located with changed production files (`src/foo.ts` → `src/foo.test.ts`).
|
||||
|
||||
**Health Dashboard:** Entire project by default. If user provides a path, scope all dimension sub-scans to that path.
|
||||
|
||||
**Scope line:** Always state what was detected — e.g., `Scope: staged changes (3 files)` or `Scope: branch changes vs main (12 files)`.
|
||||
|
||||
---
|
||||
|
||||
## The Six Decay Risks
|
||||
|
||||
Navigation index only — canonical definitions (symptoms, severity guides, sources, "What Not
|
||||
to Flag" guards) live in `decay-risks.md`. Do not duplicate or edit diagnostic questions here;
|
||||
update `decay-risks.md` directly. Book-level coverage, exceptions, and tradeoffs are in
|
||||
`source-coverage.md`.
|
||||
|
||||
| Risk | Diagnostic Question |
|
||||
|------|---------------------|
|
||||
| Cognitive Overload | How much mental effort to understand this? |
|
||||
| Change Propagation | How many unrelated things break on one change? |
|
||||
| Knowledge Duplication | Is the same decision expressed in multiple places? |
|
||||
| Accidental Complexity | Is the code more complex than the problem? |
|
||||
| Dependency Disorder | Do dependencies flow in a consistent direction? |
|
||||
| Domain Model Distortion | Does the code faithfully represent the domain? |
|
||||
|
||||
---
|
||||
|
||||
## Report Template
|
||||
|
||||
**Language rule:** Output the report in the same language the user is using. Translate the
|
||||
per-finding content and the one-sentence verdict to match the user's language. Keep the
|
||||
following in English: Iron Law field labels (Symptom / Source / Consequence / Remedy),
|
||||
book titles, principle and smell names (e.g. "Shotgun Surgery", "Divergent Change"),
|
||||
and fixed structural headers from the template below (`Findings`, `Summary`,
|
||||
`Module Dependency Graph`, `Critical`, `Warning`, `Suggestion`).
|
||||
|
||||
````
|
||||
# Brooks-Lint Review
|
||||
|
||||
**Mode:** [PR Review / Architecture Audit / Tech Debt Assessment / Test Quality Review]
|
||||
**Scope:** [file(s), directory, or description of what was reviewed]
|
||||
**Health Score:** XX/100
|
||||
|
||||
[One sentence overall verdict]
|
||||
|
||||
---
|
||||
|
||||
## Module Dependency Graph
|
||||
|
||||
<!-- Mode 2 (Architecture Audit) ONLY — omit this section for other modes -->
|
||||
<!-- classDef colors: see architecture-guide.md Step 1 Rule 6 -->
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
<!-- Sort all findings by severity: Critical first, then Warning, then Suggestion -->
|
||||
<!-- If no findings in a severity tier, omit that tier's heading -->
|
||||
|
||||
### 🔴 Critical
|
||||
|
||||
**[Risk Name] — [Short descriptive title]**
|
||||
Symptom: [exactly what was observed in the code]
|
||||
Source: [Book title — Principle or Smell name]
|
||||
Consequence: [what breaks or gets worse if this is not fixed]
|
||||
Remedy: [concrete, specific action]
|
||||
|
||||
### 🟡 Warning
|
||||
|
||||
**[Risk Name] — [Short descriptive title]**
|
||||
Symptom: ...
|
||||
Source: ...
|
||||
Consequence: ...
|
||||
Remedy: ...
|
||||
|
||||
### 🟢 Suggestion
|
||||
|
||||
**[Risk Name] — [Short descriptive title]**
|
||||
Symptom: ...
|
||||
Source: ...
|
||||
Consequence: ...
|
||||
Remedy: ...
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
[2–3 sentences: what is the most important action, and what is the overall trend]
|
||||
````
|
||||
|
||||
## Remedy Mode
|
||||
|
||||
When the user passes `--fix` or asks to "fix the findings", read
|
||||
`remedy-guide.md` from the `_shared/` directory before writing the report.
|
||||
|
||||
## Health Score Calculation
|
||||
|
||||
Base score: 100
|
||||
Deductions:
|
||||
- Each 🔴 Critical finding: −15
|
||||
- Each 🟡 Warning finding: −5
|
||||
- Each 🟢 Suggestion finding: −1
|
||||
Floor: 0 (score cannot go below 0)
|
||||
|
||||
## History Tracking
|
||||
|
||||
After generating the Health Score, attempt to append a record to `.brooks-lint-history.json`
|
||||
in the project root.
|
||||
|
||||
**Append logic:**
|
||||
1. Read the file (or start with empty array if it doesn't exist)
|
||||
2. Append: `{ date, mode, score, findings: { critical, warning, suggestion }, scope }`
|
||||
3. Write the file back
|
||||
|
||||
**Trend display:** If the history file exists and contains at least one prior record for
|
||||
the same mode, add a Trend line after the Health Score in the report:
|
||||
|
||||
**Trend:** 85 → 82 (−3) over last 3 runs
|
||||
|
||||
Show the most recent prior score and the delta. If delta is 0: "Stable at 82".
|
||||
If this is the first run for this mode: "First run — no trend data".
|
||||
|
||||
## Post-Report Triage (Optional)
|
||||
|
||||
**Guard:** Interactive sessions only — skip in CI/headless mode.
|
||||
|
||||
After reporting Warning or Suggestion findings, offer:
|
||||
> Would you like to triage these findings? (accept / dismiss / defer / skip)
|
||||
|
||||
For each finding one at a time (lowest severity first): show title, ask `[a]ccept / [d]ismiss / [f]defer / [s]kip`; wait for reply before moving to the next.
|
||||
|
||||
**Dismiss:** ask one-line reason → append to `.brooks-lint.yaml` under `suppress:` → downgraded to info in future runs.
|
||||
|
||||
**Defer:** same as dismiss, add `expires: YYYY-MM-DD` (default 90 days) → resurfaces at original severity after expiry.
|
||||
|
||||
**Suppress matching at scan time:** for each `suppress:` entry, match `risk` code and file `pattern` against findings.
|
||||
- Both match → downgrade to info (not counted in Health Score, shown under collapsed "Suppressed" section).
|
||||
- `expires` is past → ignore entry, finding resurfaces. Note in Summary: "N suppressed findings have expired and are now active again."
|
||||
|
||||
## Reference Files
|
||||
|
||||
Read on demand:
|
||||
|
||||
| File | When to Read |
|
||||
|------|-------------|
|
||||
| `source-coverage.md` | At the start of every review, before writing findings |
|
||||
| `decay-risks.md` | Before any production-code review or architecture/debt assessment |
|
||||
| `test-decay-risks.md` | Before any test review and before the PR Review "Quick Test Check" step |
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
# Custom Risk Loading Guide
|
||||
|
||||
When `.brooks-lint.yaml` contains a `custom_risks` map, this guide governs how those
|
||||
risks are loaded and scanned. Custom risks use `Cx` codes (C1, C2, …) — no conflict with
|
||||
the standard R1–R6 and T1–T6 namespaces.
|
||||
|
||||
---
|
||||
|
||||
## Loading
|
||||
|
||||
1. For each entry in `custom_risks`, validate that it has:
|
||||
- `name` — non-empty string
|
||||
- `question` — the diagnostic question to ask
|
||||
- `symptoms` — non-empty list of symptom patterns
|
||||
- `severity` — map with at least one of: `critical`, `warning`, `suggestion`
|
||||
|
||||
2. Register each valid entry as a `Cx` code alongside R1–R6 / T1–T6. Once loaded,
|
||||
`Cx` codes become valid targets for `disable`, `focus`, and `severity` fields in
|
||||
the same config file.
|
||||
|
||||
3. Report any validation errors as config warnings (do not abort the review):
|
||||
- Missing required field: `"Config warning: C1 missing 'symptoms'"`
|
||||
- Invalid code format (must be `C` followed by digits): skip, note error
|
||||
- Code conflicts with R/T namespace: skip, note error
|
||||
|
||||
---
|
||||
|
||||
## Scanning
|
||||
|
||||
During the analysis, treat each custom risk as an additional step after the standard
|
||||
process:
|
||||
|
||||
- Use `question` as the diagnostic question
|
||||
- Use `symptoms` as the symptom lookup list
|
||||
- Use the `severity` map for tier classification
|
||||
- Apply the Iron Law: `Source` field should be `"[Project-defined risk] — <risk name>"`
|
||||
- Include custom risk findings in the Health Score (same deduction rules as R/T codes)
|
||||
- In the report, custom findings appear after standard findings under a
|
||||
**### Project-Specific Risks** sub-heading
|
||||
|
||||
---
|
||||
|
||||
## Config Validation additions
|
||||
|
||||
The following codes are valid in `disable`, `focus`, and `severity`:
|
||||
- Standard: `R1`–`R6`, `T1`–`T6`
|
||||
- Custom: any `Cx` code defined in `custom_risks`
|
||||
- Any other code: skip it and emit `"Config warning: X is not a valid risk code"`
|
||||
|
|
@ -0,0 +1,294 @@
|
|||
# Decay Risk Reference
|
||||
|
||||
Six patterns that cause software to degrade. Apply the Iron Law to each finding.
|
||||
|
||||
---
|
||||
|
||||
## Risk 1: Cognitive Overload
|
||||
|
||||
**Diagnostic question:** How much mental effort does a human need to understand this?
|
||||
|
||||
Cognitive load beyond working memory causes mistakes, avoidance, and blocks the refactoring that would fix it.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Function longer than 20 lines where multiple levels of abstraction are mixed together
|
||||
- Nesting depth greater than 3 levels
|
||||
- Parameter list with more than 4 parameters
|
||||
- Magic numbers or unexplained constants
|
||||
- Variable names that require reading the implementation to understand (e.g., `d`, `tmp2`, `flag`)
|
||||
- Boolean expressions with 3 or more conditions combined
|
||||
- Train-wreck chains: `a.getB().getC().doD()`
|
||||
- Code names that do not match what the business calls the same concept
|
||||
- Flag Arguments: a boolean parameter that makes the function do two fundamentally different
|
||||
things depending on its value — a sign the function has two responsibilities
|
||||
- Primitive Obsession: domain concepts represented as primitive types (`String email`,
|
||||
`int orderId`, `double money`) rather than purpose-built value types — forces callers to know
|
||||
which string is an email and which is a name
|
||||
- Shallow module: the interface or documentation of a component is more complex relative to
|
||||
the functionality it provides
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Long Method | Fowler — Refactoring | Long Method |
|
||||
| Long Parameter List | Fowler — Refactoring | Long Parameter List |
|
||||
| Message Chains | Fowler — Refactoring | Message Chains |
|
||||
| Flag Arguments | Fowler — Refactoring | Flag Arguments |
|
||||
| Primitive Obsession | Fowler — Refactoring | Primitive Obsession |
|
||||
| Function length and nesting | McConnell — Code Complete | Ch. 7: High-Quality Routines |
|
||||
| Variable naming | McConnell — Code Complete | Ch. 11: The Power of Variable Names |
|
||||
| Magic numbers | McConnell — Code Complete | Ch. 12: Fundamental Data Types |
|
||||
| Domain name mismatch | Evans — Domain-Driven Design | Ubiquitous Language |
|
||||
| Shallow Module | Ousterhout — A Philosophy of Software Design | Ch. 4: Modules Should Be Deep |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: function > 50 lines, nesting > 5, or virtually no meaningful names
|
||||
- 🟡 Warning: function 20–50 lines, nesting 4–5, some unclear names
|
||||
- 🟢 Suggestion: minor naming issues, 1–2 magic numbers, isolated train-wreck chains
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- Linear code with clear names and guard clauses is not automatically high cognitive load
|
||||
- Internal implementation detail hidden behind a deep, simple module boundary is not a shallow-module problem
|
||||
- Domain-specific terminology should not be flagged if it matches how experts actually speak
|
||||
|
||||
---
|
||||
|
||||
## Risk 2: Change Propagation
|
||||
|
||||
**Diagnostic question:** How many unrelated things break when you change one thing?
|
||||
|
||||
Each change ripples to unrelated modules, slowing velocity and multiplying regression risk.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Modifying one feature requires touching more than 3 files in unrelated modules
|
||||
- One class changes for multiple different business reasons (e.g., `UserService` changes for
|
||||
billing logic AND notification logic AND profile logic)
|
||||
- A method uses more data from another class than from its own class
|
||||
- Two classes know each other's internal state directly
|
||||
- Changing one module requires recompiling or retesting many unrelated modules
|
||||
- **Hyrum's Law**: with sufficient callers, every observable behavior — including
|
||||
implementation details, error message text, coincidental call ordering, and undocumented
|
||||
side effects — becomes an implicit contract that callers depend on, even though it was
|
||||
never guaranteed by the declared API
|
||||
- **Orthogonality violation**: changing one dimension of a feature forces edits in
|
||||
unrelated dimensions — adding a new payment type should not require touching logging,
|
||||
caching, or notification code, but in a non-orthogonal design it does
|
||||
- Information Leakage: a design decision (e.g., a file format, protocol detail, or data
|
||||
shape) is encoded in more than one module, so changing it requires coordinated edits
|
||||
in multiple places even though only one module "owns" the concept
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Shotgun Surgery | Fowler — Refactoring | Shotgun Surgery |
|
||||
| Divergent Change | Fowler — Refactoring | Divergent Change |
|
||||
| Feature Envy | Fowler — Refactoring | Feature Envy |
|
||||
| Inappropriate Intimacy | Fowler — Refactoring | Inappropriate Intimacy |
|
||||
| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
|
||||
| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle |
|
||||
| High change propagation radius | Brooks — The Mythical Man-Month | Ch. 2: Brooks's Law (communication overhead) |
|
||||
| Hyrum's Law | Winters et al. — Software Engineering at Google | Ch. 1: Hyrum's Law |
|
||||
| Information Leakage | Ousterhout — A Philosophy of Software Design | Ch. 5: Information Hiding and Leakage |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: one change touches > 5 files, or there is a structural dependency inversion (domain depends on infrastructure)
|
||||
- 🟡 Warning: one change touches 3–5 files, mild coupling between modules
|
||||
- 🟢 Suggestion: minor coupling, easily isolatable
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- A composition root wiring concrete dependencies is not a DIP violation by itself
|
||||
- A stable public API with intentionally supported behavior is not automatically Hyrum's Law debt
|
||||
- Similar edits inside one bounded context may be normal coordinated change, not shotgun surgery
|
||||
|
||||
---
|
||||
|
||||
## Risk 3: Knowledge Duplication
|
||||
|
||||
**Diagnostic question:** Is the same decision expressed in more than one place?
|
||||
|
||||
Multiple copies drift apart silently. DRY is about decisions, not code lines.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Same logic copy-pasted across multiple files or functions
|
||||
- Same concept named differently in different parts of the codebase
|
||||
(e.g., `user`, `account`, `member`, `customer` all referring to the same domain entity)
|
||||
- Parallel class hierarchies that must change in sync
|
||||
(e.g., adding a new payment type requires adding a class in 3 different hierarchies)
|
||||
- Configuration values repeated as literals in multiple places
|
||||
- Two modules that implement the same algorithm independently
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Code duplication | Fowler — Refactoring | Duplicate Code |
|
||||
| Parallel Inheritance | Fowler — Refactoring | Parallel Inheritance Hierarchies |
|
||||
| DRY violation | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
|
||||
| Inconsistent naming | Evans — Domain-Driven Design | Ubiquitous Language |
|
||||
| Alternative Classes | Fowler — Refactoring | Alternative Classes with Different Interfaces |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: core business logic duplicated across modules, or same domain concept named 3+ different ways
|
||||
- 🟡 Warning: utility code duplicated, naming inconsistent within a subsystem
|
||||
- 🟢 Suggestion: minor literal duplication, single naming inconsistency
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- Repetition across separate bounded contexts is not automatically duplicate knowledge
|
||||
- Temporary duplication during an active extraction or migration is not necessarily debt
|
||||
- Shared protocol constants repeated at explicit boundaries may be acceptable when local ownership is clearer
|
||||
|
||||
---
|
||||
|
||||
## Risk 4: Accidental Complexity
|
||||
|
||||
**Diagnostic question:** Is the code more complex than the problem it solves?
|
||||
|
||||
Accidental complexity accumulates addition by addition until developers fight scaffolding more than solving the problem.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Abstractions built "for future use" with no current consumer
|
||||
(e.g., a plugin system for a use case that has only one known implementation)
|
||||
- Classes that barely justify their existence (wrap a single method call)
|
||||
- Classes that only delegate to another class without adding behavior (pure middle-men)
|
||||
- Second attempt at a system that is significantly more elaborate than the first,
|
||||
adding generality for requirements that do not yet exist
|
||||
- Switch statements that signal missing polymorphism
|
||||
- Configuration options that have never been changed from their defaults
|
||||
- Framework code larger than the application it powers
|
||||
- Code grown under sustained tactical shortcuts: each workaround seemed small, but
|
||||
accumulated shortcuts mean every new feature requires fighting the existing structure
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Speculative Generality | Fowler — Refactoring | Speculative Generality |
|
||||
| Lazy Class | Fowler — Refactoring | Lazy Class |
|
||||
| Middle Man | Fowler — Refactoring | Middle Man |
|
||||
| Switch Statements | Fowler — Refactoring | Switch Statements |
|
||||
| Second System Effect | Brooks — The Mythical Man-Month | Ch. 5: The Second-System Effect |
|
||||
| YAGNI violations | McConnell — Code Complete | Ch. 5: Design in Construction |
|
||||
| Over-engineering | Hunt & Thomas — The Pragmatic Programmer | Topic 4: Good-Enough Software |
|
||||
| Tactical programming debt | Ousterhout — A Philosophy of Software Design | Ch. 3: Strategic vs. Tactical Programming |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: an entire subsystem built around a speculative requirement, or framework overhead dominates domain logic
|
||||
- 🟡 Warning: several unnecessary abstractions or wrapper classes, unused configuration systems
|
||||
- 🟢 Suggestion: one or two lazy classes or middle-man patterns in non-critical paths
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- A switch over an external protocol, wire format, or closed enum is not automatically missing polymorphism
|
||||
- Thin wrappers that absorb vendor churn or hide instability may be justified
|
||||
- A larger second version is not second-system effect unless the added generality exceeds present needs
|
||||
|
||||
---
|
||||
|
||||
## Risk 5: Dependency Disorder
|
||||
|
||||
**Diagnostic question:** Do dependencies flow in a consistent, predictable direction?
|
||||
|
||||
When business logic depends on infrastructure, infrastructure changes cascade into domain changes. Cycles prevent isolation.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Circular dependencies between modules or packages
|
||||
- High-level business logic directly imports from low-level infrastructure
|
||||
(e.g., a domain service imports from a specific database driver)
|
||||
- Stable, widely-used components depend on unstable, frequently-changing ones
|
||||
- Abstract components depending on concrete implementations
|
||||
- Law of Demeter violations: `order.getCustomer().getAddress().getCity()`
|
||||
- Module fan-out greater than 5 (imports from more than 5 other modules)
|
||||
- A module implements an interface but only uses a subset of its methods, or must
|
||||
provide stub implementations for methods it does not need (ISP violation: fat interface
|
||||
forces unwanted dependencies on callers)
|
||||
- The system feels like "one mind did not design this" — different modules use
|
||||
incompatible architectural patterns with no clear rule for which to use where
|
||||
- Direct version-pinned dependencies on transitive packages (diamond dependency risk);
|
||||
upgrading one library requires coordinating multiple unrelated teams or repositories
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Dependency cycles | Martin — Clean Architecture | Acyclic Dependencies Principle (ADP) |
|
||||
| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle (DIP) |
|
||||
| Instability direction | Martin — Clean Architecture | Stable Dependencies Principle (SDP) |
|
||||
| Abstraction mismatch | Martin — Clean Architecture | Stable Abstractions Principle (SAP) |
|
||||
| ISP violation | Martin — Clean Architecture | Interface Segregation Principle (ISP) |
|
||||
| Conceptual integrity | Brooks — The Mythical Man-Month | Ch. 4: Conceptual Integrity |
|
||||
| Law of Demeter | Hunt & Thomas — The Pragmatic Programmer | Ch. 5: Decoupling and the Law of Demeter |
|
||||
| SOLID violations | Martin — Clean Architecture | Single Responsibility, Open/Closed Principles |
|
||||
| Diamond dependency / upgrade blockage | Winters et al. — Software Engineering at Google | Ch. 21: Dependency Management |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: dependency cycles present, or domain layer directly depends on infrastructure layer
|
||||
- 🟡 Warning: several SDP or DIP violations but no cycles; conceptual inconsistency across modules
|
||||
- 🟢 Suggestion: minor Demeter violations, slightly elevated fan-out in isolated modules
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- High fan-out in an orchestration layer or composition root is not automatically disorder
|
||||
- Adapter modules may depend on both domain and infrastructure when they explicitly translate across the boundary
|
||||
- A stable facade over many leaf dependencies can be healthy if dependency policy is clear
|
||||
|
||||
---
|
||||
|
||||
## Risk 6: Domain Model Distortion
|
||||
|
||||
**Diagnostic question:** Does the code faithfully represent the problem it is solving?
|
||||
|
||||
Code that mismatches business language forces mental translation. Over time it models schemas instead of the domain, with logic bleeding into service layers.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Business logic scattered across service layers while domain objects have only getters and setters
|
||||
(anemic domain model)
|
||||
- Code variable, class, or method names that do not match what business stakeholders call the concept
|
||||
- A class whose only purpose is to hold data with no behavior (pure data bag)
|
||||
- A subclass that ignores or overrides most of its parent's behavior (refuses the inheritance)
|
||||
- Bounded context boundaries crossed without any translation or anti-corruption layer
|
||||
- Methods that are more interested in the data of another class than their own
|
||||
(domain logic in the wrong place)
|
||||
- A subclass overrides most parent methods with incompatible behavior or throws exceptions
|
||||
where the parent contract guarantees success (LSP violation: substitution breaks callers)
|
||||
- Value Objects treated as Entities: a concept defined entirely by its attributes (e.g., Money,
|
||||
Email, Address) is given a mutable ID and lifecycle instead of being replaced when changed
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Anemic Domain Model | Evans — Domain-Driven Design | Domain Model pattern |
|
||||
| Ubiquitous Language drift | Evans — Domain-Driven Design | Ubiquitous Language |
|
||||
| Bounded context violation | Evans — Domain-Driven Design | Bounded Context |
|
||||
| Data Class | Fowler — Refactoring | Data Class |
|
||||
| Refused Bequest | Fowler — Refactoring | Refused Bequest |
|
||||
| Feature Envy | Fowler — Refactoring | Feature Envy |
|
||||
| LSP violation | Martin — Clean Architecture | Liskov Substitution Principle (LSP) |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: domain logic entirely in service layer, domain objects are pure data bags with no behavior
|
||||
- 🟡 Warning: partial anemia, some naming inconsistency between code and domain language
|
||||
- 🟢 Suggestion: minor naming drift in non-core areas, isolated cases of Feature Envy
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- CRUD-heavy workflows may legitimately use transaction scripts instead of rich domain objects
|
||||
- DTOs, persistence records, and API payload models are allowed to be data-only
|
||||
- Shared infrastructure language should not be mistaken for domain drift if the business model itself is simple
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
# Remedy Guide — Actionable Fix Mode
|
||||
|
||||
When `--fix` is active, enhance every finding's Remedy field to be directly actionable:
|
||||
|
||||
## Remedy Enhancement Rules
|
||||
|
||||
For each finding, the Remedy must include:
|
||||
1. **Target**: exact file path and function/class name
|
||||
2. **Action**: specific refactoring operation (e.g., "Extract lines 45-67 into a new
|
||||
function `calculateShippingCost(items, config)`")
|
||||
3. **Rationale**: one sentence explaining why this specific fix (not just "refactor")
|
||||
|
||||
## Fixability Classification
|
||||
|
||||
Classify each finding after writing the enhanced Remedy:
|
||||
|
||||
| Tier | Criteria | Report label |
|
||||
|------|---------|-------------|
|
||||
| Quick fix | Single-file, mechanical: rename, extract constant, reorder imports | `[quick-fix]` |
|
||||
| Guided fix | Requires a design choice: where to split, what interface shape | `[guided]` |
|
||||
| Manual | Cross-module, needs domain knowledge or team discussion | `[manual]` |
|
||||
|
||||
Append the label to the finding title: `**R1 — Long function in OrderService [quick-fix]**`
|
||||
|
||||
## Output Addition
|
||||
|
||||
After the standard report, add a **Fix Summary** section:
|
||||
|
||||
| Finding | Tier | Target File | Action |
|
||||
|---------|------|------------|--------|
|
||||
| R1 — Long function | quick-fix | src/order.ts:45 | Extract `calculateTotal()` |
|
||||
| R5 — Circular dep | manual | src/models/ ↔ src/services/ | Introduce interface boundary |
|
||||
|
||||
## What NOT to do
|
||||
- Do NOT modify any files. Phase 1 is diagnosis + actionable plan only.
|
||||
- Do NOT generate diffs or code blocks. The Remedy text IS the deliverable.
|
||||
- Do NOT re-score. The Health Score reflects current state, not projected state.
|
||||
|
|
@ -0,0 +1,248 @@
|
|||
---
|
||||
books:
|
||||
- The Mythical Man-Month
|
||||
- Code Complete
|
||||
- Refactoring
|
||||
- Clean Architecture
|
||||
- The Pragmatic Programmer
|
||||
- Domain-Driven Design
|
||||
- A Philosophy of Software Design
|
||||
- Software Engineering at Google
|
||||
- xUnit Test Patterns
|
||||
- The Art of Unit Testing
|
||||
- Working Effectively with Legacy Code
|
||||
- How Google Tests Software
|
||||
---
|
||||
|
||||
# Source Coverage Matrix
|
||||
|
||||
Use this file after selecting a mode and before writing findings.
|
||||
It exists to prevent shallow "book-name citation" reviews.
|
||||
|
||||
## Review Discipline
|
||||
|
||||
- Cite a book only when the observed symptom actually matches that book's principle.
|
||||
- A threshold crossing is a hint, not a verdict. Check context, intent, and blast radius.
|
||||
- Look for justified tradeoffs before flagging a smell as debt.
|
||||
- Prefer concrete architectural or domain consequences over abstract style complaints.
|
||||
- If two books pull in different directions, state the tradeoff instead of pretending there is no tension.
|
||||
|
||||
---
|
||||
|
||||
## Frederick Brooks — *The Mythical Man-Month*
|
||||
|
||||
**Encoded today**
|
||||
- Change propagation as communication overhead
|
||||
- Second-System Effect
|
||||
- Conceptual Integrity
|
||||
|
||||
**Do not ignore**
|
||||
- Whether the design shows a single coherent idea or competing local optimizations
|
||||
- Whether cross-team coordination cost is becoming part of feature cost
|
||||
|
||||
**Do not over-flag**
|
||||
- Large systems are not automatically second systems
|
||||
- Multi-module designs are acceptable when they preserve conceptual integrity
|
||||
|
||||
---
|
||||
|
||||
## Steve McConnell — *Code Complete*
|
||||
|
||||
**Encoded today**
|
||||
- Routine length, nesting, naming, and magic numbers
|
||||
- Construction-phase YAGNI checks
|
||||
- Defensive programming and error-handling discipline (guard clauses, input validation,
|
||||
explicit error paths, assertions for invariants)
|
||||
|
||||
**Do not ignore**
|
||||
- Whether low-level readability choices compound into operational risk
|
||||
- Whether missing error handling makes failure modes invisible to maintainers
|
||||
|
||||
**Do not over-flag**
|
||||
- Small, explicit guard clauses are not cognitive overload
|
||||
- A long routine may be acceptable when it is linear, well-named, and single-purpose
|
||||
|
||||
---
|
||||
|
||||
## Martin Fowler — *Refactoring*
|
||||
|
||||
**Encoded today**
|
||||
- Long Method, Long Parameter List, Message Chains
|
||||
- Shotgun Surgery, Divergent Change, Feature Envy, Inappropriate Intimacy
|
||||
- Duplicate Code, Speculative Generality, Lazy Class, Middle Man, Data Class
|
||||
- Flag Arguments: boolean parameters that split a function into two behaviors
|
||||
- Primitive Obsession: domain concepts expressed as raw primitive types instead of value types
|
||||
|
||||
**Do not ignore**
|
||||
- Whether the code smell is local or systemic
|
||||
- Whether a refactoring target has a natural home in the model
|
||||
|
||||
**Do not over-flag**
|
||||
- Temporary duplication during an active extraction is not always debt
|
||||
- A data-focused structure is acceptable when it is intentionally a DTO or boundary record
|
||||
|
||||
---
|
||||
|
||||
## Robert C. Martin — *Clean Architecture*
|
||||
|
||||
**Encoded today**
|
||||
- DIP, ADP, SDP, SAP, and layering direction
|
||||
- ISP: fat interfaces that force callers to depend on methods they do not use
|
||||
- LSP: subclasses that break the behavioral contract of their parent type
|
||||
- SRP and OCP: classes with multiple reasons to change; modules closed to modification
|
||||
but open to extension via abstraction
|
||||
|
||||
**Do not ignore**
|
||||
- Policy vs detail boundaries
|
||||
- Whether dependency arrows preserve replaceability and testability
|
||||
|
||||
**Do not over-flag**
|
||||
- Composition roots may depend on concrete infrastructure by design
|
||||
- Thin adapter layers can import both directions when they are explicitly boundary glue
|
||||
|
||||
---
|
||||
|
||||
## Andrew Hunt & David Thomas — *The Pragmatic Programmer*
|
||||
|
||||
**Encoded today**
|
||||
- Orthogonality
|
||||
- DRY
|
||||
- Law of Demeter
|
||||
|
||||
**Do not ignore**
|
||||
- Whether knowledge duplication is really duplicated decision-making
|
||||
- Whether coupling is accidental or a deliberate local simplification
|
||||
|
||||
**Do not over-flag**
|
||||
- Similar code in different bounded contexts is not automatically a DRY violation
|
||||
- Direct object access inside a cohesive aggregate is not always a Demeter problem
|
||||
|
||||
---
|
||||
|
||||
## Eric Evans — *Domain-Driven Design*
|
||||
|
||||
**Encoded today**
|
||||
- Ubiquitous Language
|
||||
- Bounded Context
|
||||
- Anemic Domain Model
|
||||
- Entity vs Value Object: objects with identity and lifecycle vs. objects defined solely by
|
||||
their attributes (Money, Email, Address should be immutable value types, not mutable entities)
|
||||
- Aggregate Roots: who owns the invariant boundary; cross-aggregate access only through the root
|
||||
|
||||
**Do not ignore**
|
||||
- Aggregate boundaries, invariant ownership, and anti-corruption layers
|
||||
- Whether names match the business language used by experts
|
||||
|
||||
**Do not over-flag**
|
||||
- CRUD-heavy workflows may legitimately use transaction scripts
|
||||
- Thin entities are acceptable when the domain itself is simple
|
||||
|
||||
---
|
||||
|
||||
## John Ousterhout — *A Philosophy of Software Design*
|
||||
|
||||
**Encoded today**
|
||||
- Deep vs shallow modules
|
||||
- Strategic vs tactical programming
|
||||
- Information Leakage: a design decision encoded in more than one module, creating
|
||||
change coupling even when no explicit import exists between the modules
|
||||
|
||||
**Do not ignore**
|
||||
- Interface complexity relative to hidden complexity
|
||||
- Whether repeated tactical patches are raising long-term cognitive load
|
||||
- Whether a "helper" exposes internal design decisions that callers should not know
|
||||
|
||||
**Do not over-flag**
|
||||
- Internal implementation complexity is fine when the interface stays simple
|
||||
- A small wrapper is acceptable when it meaningfully absorbs volatility
|
||||
|
||||
---
|
||||
|
||||
## Titus Winters, Tom Manshreck, Hyrum Wright — *Software Engineering at Google*
|
||||
|
||||
**Encoded today**
|
||||
- Hyrum's Law
|
||||
- Dependency management and upgrade blockage
|
||||
- Code sustainability: whether code as written can be maintained, migrated, and upgraded
|
||||
over a multi-year horizon without heroic effort
|
||||
- Backward compatibility: whether API changes preserve existing callers or force
|
||||
coordinated upgrades across the organization
|
||||
|
||||
**Do not ignore**
|
||||
- De facto APIs created by observable behavior
|
||||
- The maintenance cost of exposing too much surface area
|
||||
- Whether the dependency graph will allow independent upgrades over time
|
||||
|
||||
**Do not over-flag**
|
||||
- A stable public API is not a liability if it is intentionally supported
|
||||
- Fan-out alone is not disorder when dependency policy is explicit and governed
|
||||
|
||||
---
|
||||
|
||||
## Gerard Meszaros — *xUnit Test Patterns*
|
||||
|
||||
**Encoded today**
|
||||
- Assertion Roulette, Mystery Guest, General Fixture
|
||||
- Eager Test, Lazy Test, Test Code Duplication, Behavior Verification
|
||||
- Erratic Test: tests that produce non-deterministic results due to shared state,
|
||||
time dependence, or ordering assumptions between tests
|
||||
|
||||
**Do not ignore**
|
||||
- Whether test failures are diagnosable
|
||||
- Whether the suite shape amplifies maintenance cost
|
||||
|
||||
**Do not over-flag**
|
||||
- Multiple assertions are acceptable when they express one behavior with one failure story
|
||||
- Shared fixtures are acceptable when every field is relevant to the scenario
|
||||
|
||||
---
|
||||
|
||||
## Roy Osherove — *The Art of Unit Testing*
|
||||
|
||||
**Encoded today**
|
||||
- Test naming discipline
|
||||
- Test isolation
|
||||
- Mock usage guidelines
|
||||
- Completeness of edge-path tests
|
||||
|
||||
**Do not ignore**
|
||||
- Whether tests verify behavior rather than wiring
|
||||
- Whether seams are used to simplify tests, or production code is being contorted for testability
|
||||
|
||||
**Do not over-flag**
|
||||
- A mock is acceptable when the dependency is nondeterministic and the assertion still verifies behavior
|
||||
- Naming conventions are guidance; clarity is the goal
|
||||
|
||||
---
|
||||
|
||||
## Michael Feathers — *Working Effectively with Legacy Code*
|
||||
|
||||
**Encoded today**
|
||||
- Legacy code as code without tests
|
||||
- Sensing and Separation
|
||||
- Seams
|
||||
- Characterization Tests
|
||||
|
||||
**Do not ignore**
|
||||
- Whether the team can change a risky area safely today
|
||||
- Whether the code offers any seam for isolating behavior under change
|
||||
|
||||
**Do not over-flag**
|
||||
- Untested code is not automatically legacy if it is stable and not under active change
|
||||
- Characterization tests are most important before modifying unclear existing behavior
|
||||
|
||||
---
|
||||
|
||||
## Google Engineering — *How Google Tests Software*
|
||||
|
||||
**Encoded today**
|
||||
- Change coverage vs line coverage
|
||||
- Pyramid shape and suite portfolio economics
|
||||
|
||||
**Do not ignore**
|
||||
- Whether the suite reflects business risk, not just percentages
|
||||
- Whether expensive tests dominate feedback loops
|
||||
|
||||
**Do not over-flag**
|
||||
- A non-70:20:10 ratio can be healthy when justified by platform constraints or product risk
|
||||
- High coverage is useful when paired with meaningful branch and change protection
|
||||
|
|
@ -0,0 +1,246 @@
|
|||
# Test Decay Risk Reference
|
||||
|
||||
Six patterns that cause test suites to degrade. Apply the Iron Law to each finding.
|
||||
|
||||
---
|
||||
|
||||
## Risk T1: Test Obscurity
|
||||
|
||||
**Diagnostic question:** How much effort does it take to understand what this test verifies?
|
||||
|
||||
Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Assertion Roulette: multiple assertions with no message string — when one fails, it is
|
||||
impossible to determine which behavior broke without reading every assertion
|
||||
- Mystery Guest: test depends on external state (files, database rows, shared fixtures)
|
||||
that is not visible in the test body
|
||||
- Test names that do not express the scenario and expected outcome
|
||||
(e.g., `test1`, `shouldWork`, `testLogin`, `testUserService`)
|
||||
- General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making
|
||||
each test's preconditions invisible
|
||||
- Test body requires reading production code to understand what is being verified
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Assertion Roulette | Meszaros — xUnit Test Patterns | Assertion Roulette (p.224) |
|
||||
| Mystery Guest | Meszaros — xUnit Test Patterns | Mystery Guest (p.411) |
|
||||
| General Fixture | Meszaros — xUnit Test Patterns | General Fixture (p.316) |
|
||||
| Test naming | Osherove — The Art of Unit Testing | method_scenario_expected naming convention |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages
|
||||
- 🟡 Warning: multiple Mystery Guests; several ambiguous test names
|
||||
- 🟢 Suggestion: minor naming issues; isolated General Fixture
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story
|
||||
- Shared setup is fine when every initialized value is relevant to nearly every test
|
||||
- Concise test names are acceptable if scenario and expected outcome are still obvious
|
||||
|
||||
---
|
||||
|
||||
## Risk T2: Test Brittleness
|
||||
|
||||
**Diagnostic question:** Do tests break when you refactor without changing behavior?
|
||||
|
||||
Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Tests assert on private method results, internal state, or implementation details
|
||||
rather than observable behavior
|
||||
- Eager Test: one test method verifies multiple unrelated behaviors; any single change
|
||||
causes it to fail regardless of which behavior was touched
|
||||
- Over-specified: assertions enforce mock call order or exact parameter values that are
|
||||
irrelevant to the behavior being tested
|
||||
- Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed
|
||||
- Erratic Test: a test produces different results across runs without any change to
|
||||
production code — caused by race conditions, time-dependent logic, random data, or
|
||||
shared mutable state between tests
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Eager Test | Meszaros — xUnit Test Patterns | Eager Test (p.228) |
|
||||
| Erratic Test | Meszaros — xUnit Test Patterns | Erratic Test |
|
||||
| Implementation coupling | Osherove — The Art of Unit Testing | Test isolation principle |
|
||||
| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail
|
||||
- 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions
|
||||
- 🟢 Suggestion: isolated over-specification in non-critical tests
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- Verifying an externally observable event or emitted command is not implementation coupling
|
||||
- One test with several assertions is acceptable when all assertions support one behavior claim
|
||||
- A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring
|
||||
|
||||
---
|
||||
|
||||
## Risk T3: Test Duplication
|
||||
|
||||
**Diagnostic question:** Is the same test scenario expressed in more than one place?
|
||||
|
||||
Duplicated tests must change in multiple places and create false confidence without testing distinct behavior.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests
|
||||
without extraction into a shared helper
|
||||
- Lazy Test: multiple tests verifying identical behavior with no differentiation in input,
|
||||
state, or expected output
|
||||
- Same boundary condition tested identically at unit, integration, and E2E level —
|
||||
three copies with no layer differentiation
|
||||
- Test helper functions or fixtures duplicated across test files instead of shared
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Test Code Duplication | Meszaros — xUnit Test Patterns | Test Code Duplication (p.213) |
|
||||
| Lazy Test | Meszaros — xUnit Test Patterns | Lazy Test (p.232) |
|
||||
| DRY violation in tests | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation
|
||||
- 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction
|
||||
- 🟢 Suggestion: minor helper duplication; isolated Lazy Tests
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- The same scenario may appear at unit and integration level when each layer verifies a distinct risk
|
||||
- Small local setup duplication can be clearer than an over-abstracted fixture maze
|
||||
- Similar assertions against different domain rules are not Lazy Tests if the business intent differs
|
||||
|
||||
---
|
||||
|
||||
## Risk T4: Mock Abuse
|
||||
|
||||
**Diagnostic question:** Is the test more complex than the behavior it tests?
|
||||
|
||||
Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Mock setup code is longer than the test logic itself
|
||||
- Primary assertion is `expect(mock).toHaveBeenCalledWith(...)` — the test verifies
|
||||
that a mock was called, not that any real behavior occurred
|
||||
- Test-only methods added to production classes for lifecycle management in tests
|
||||
- Single unit test uses more than 3 mocks
|
||||
- Incomplete Mock: mock object missing fields that downstream code will access,
|
||||
causing silent failures only visible in integration
|
||||
- Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Mock count > 3 | Osherove — The Art of Unit Testing | Mock usage guidelines |
|
||||
| Testing mock behavior | Meszaros — xUnit Test Patterns | Behavior Verification (p.544) |
|
||||
| Test-only production methods | Feathers — Working Effectively with Legacy Code | Ch. 3: Sensing and Separation |
|
||||
| Hard-Coded Test Data | Meszaros — xUnit Test Patterns | Hard-Coded Test Data (p.534) |
|
||||
| Incomplete Mock | Osherove — The Art of Unit Testing | Mock completeness requirement |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests
|
||||
- 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications
|
||||
- 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior
|
||||
- Fakes and spies used to observe state transitions are not mock abuse by default
|
||||
- One interaction assertion may be appropriate when the interaction itself is the behavior under test
|
||||
|
||||
---
|
||||
|
||||
## Risk T5: Coverage Illusion
|
||||
|
||||
**Diagnostic question:** Does the test suite actually protect against the failures that matter?
|
||||
|
||||
Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered."
|
||||
|
||||
### Symptoms
|
||||
|
||||
- High line coverage but error-handling branches, boundary conditions, and exception paths
|
||||
have no corresponding tests
|
||||
- Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases
|
||||
- Legacy code areas are being actively modified with no tests present
|
||||
(Feathers: "legacy code is code without tests")
|
||||
- Coverage percentage treated as a sign-off criterion; critical change paths remain untested
|
||||
- Tests assert on return values but not on important side effects such as database writes,
|
||||
event publications, or state transitions
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Legacy code = no tests | Feathers — Working Effectively with Legacy Code | Ch. 1: "Legacy code is code without tests" |
|
||||
| Change coverage vs line coverage | Google — How Google Tests Software | Ch. 11: Testing at Google Scale |
|
||||
| Happy-path only | Osherove — The Art of Unit Testing | Test completeness principle |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent
|
||||
- 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent
|
||||
- 🟢 Suggestion: a few non-critical paths missing sad-path tests
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- High line coverage is useful when paired with branch, boundary, and change-path coverage
|
||||
- A new module may have limited coverage early if it is still private and low-risk
|
||||
- Side-effect assertions may live in integration tests rather than unit tests without implying a gap
|
||||
|
||||
---
|
||||
|
||||
## Risk T6: Architecture Mismatch
|
||||
|
||||
**Diagnostic question:** Does the test suite structure reflect the system's actual risk profile?
|
||||
|
||||
Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Inverted test pyramid: E2E or integration test count exceeds unit test count,
|
||||
causing a slow and fragile suite
|
||||
- Legacy code with no seam points: no interfaces, dependency injection, or seams exist,
|
||||
making it impossible to test in isolation without modifying production code
|
||||
- Legacy areas being modified have no Characterization Tests to capture current behavior
|
||||
before changes are made
|
||||
- Full suite execution time exceeds 10 minutes (indicates architectural problem,
|
||||
not a performance problem — too many slow tests)
|
||||
- High-risk and low-risk paths are tested at identical density;
|
||||
no risk-based prioritization in test distribution
|
||||
|
||||
### Sources
|
||||
|
||||
| Symptom | Book | Principle / Smell |
|
||||
|---------|------|-------------------|
|
||||
| Inverted pyramid | Google — How Google Tests Software | 70:20:10 unit:integration:E2E ratio |
|
||||
| No seam points | Feathers — Working Effectively with Legacy Code | Ch. 4: Seam Model |
|
||||
| Missing Characterization Tests | Feathers — Working Effectively with Legacy Code | Ch. 13: Characterization Tests |
|
||||
| Suite execution time | Meszaros — xUnit Test Patterns | Slow Tests (p. 253) |
|
||||
|
||||
### Severity Guide
|
||||
|
||||
- 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted
|
||||
- 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests
|
||||
- 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests
|
||||
|
||||
### What Not to Flag
|
||||
|
||||
- Deviating from 70:20:10 can be justified by platform constraints or product risk
|
||||
- A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered
|
||||
- A small number of critical-path E2E tests is desirable, not a smell
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
---
|
||||
name: brooks-audit
|
||||
description: >
|
||||
Architecture audit that maps module dependencies, checks layering integrity, and
|
||||
flags structural decay across a codebase, drawing on twelve classic engineering books.
|
||||
Triggers when: user asks to audit architecture, review folder/module structure,
|
||||
check for circular imports, understand how the codebase is organized, or asks
|
||||
"does this follow clean architecture?", "why does everything depend on everything?",
|
||||
"are our layers correct?", "where should this code live?".
|
||||
Also triggers for onboarding requests: "explain this codebase to a new developer"
|
||||
or "give me a codebase tour" (use onboarding mode).
|
||||
Do NOT trigger for: PR-level code review (use brooks-review) or line-level refactoring
|
||||
questions — this skill analyzes structural/module-level concerns, not individual functions.
|
||||
---
|
||||
|
||||
# Brooks-Lint — Architecture Audit
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
|
||||
4. Read `architecture-guide.md` in this directory for the audit framework
|
||||
|
||||
## Process
|
||||
|
||||
**Onboarding mode:** If the user asks for an onboarding report, codebase tour, or
|
||||
"explain this codebase to a new developer", read `onboarding-guide.md` from this
|
||||
directory and follow it instead of `architecture-guide.md`. This mode explains rather
|
||||
than diagnoses — no Health Score, no Iron Law findings.
|
||||
|
||||
**If the user has not specified files or a directory to audit:** apply Auto Scope
|
||||
Detection from `../_shared/common.md` to determine the audit scope before proceeding.
|
||||
|
||||
1. Gather codebase context and draw the module dependency graph as Mermaid (Steps 0–1 of the guide)
|
||||
2. Scan for each decay risk in the order specified (Steps 2–4 of the guide)
|
||||
3. Assign node colors in the Mermaid diagram based on findings (red/yellow/green) — after Step 4
|
||||
4. Run the Testability Seam Assessment (Step 5 of the guide)
|
||||
5. Run the Conway's Law check (Step 6 of the guide)
|
||||
6. Output using the Report Template from common.md — Mermaid graph FIRST, then Findings
|
||||
|
||||
**Mode line in report:** `Architecture Audit`
|
||||
|
|
@ -0,0 +1,195 @@
|
|||
# Architecture Audit Guide — Mode 2
|
||||
|
||||
**Purpose:** Analyze the module and dependency structure of a system for decay risks that
|
||||
operate at the architectural level. Every finding must follow the Iron Law:
|
||||
Symptom → Source → Consequence → Remedy.
|
||||
|
||||
**Monorepo note:** Treat each deployable service or library as a top-level module. Draw
|
||||
dependencies between services, not between their internal packages. Apply the Conway's Law
|
||||
check at the service ownership level. Within a single service, apply standard module-level analysis.
|
||||
|
||||
---
|
||||
|
||||
## Analysis Process
|
||||
|
||||
Work through these six steps in order.
|
||||
|
||||
### Step 0: Gather Codebase Context
|
||||
|
||||
Before drawing anything, establish what you can see.
|
||||
|
||||
**If the user provided a full directory tree or pasted relevant file contents:** skip the
|
||||
proactive reading below and proceed to Step 1.
|
||||
|
||||
**Otherwise, proactively read the project using these tools:**
|
||||
|
||||
1. **Top-level structure** — glob top two levels to identify module boundaries:
|
||||
```
|
||||
Glob: **/*(depth 2, directories only)
|
||||
```
|
||||
2. **Entry points** — read the package manifest or main config file (e.g., `package.json`,
|
||||
`go.mod`, `pom.xml`, `Cargo.toml`, `pyproject.toml`) to confirm language, framework,
|
||||
and declared dependencies.
|
||||
3. **Dependency edges** — grep import statements to discover inter-module calls. Run once
|
||||
per language present; limit to the first 200 matches to avoid token overrun:
|
||||
```
|
||||
Grep: "^\s*(import|from|require\(|use )" across *.ts|*.py|*.go|*.rs|*.java
|
||||
```
|
||||
4. **Large modules** — for any top-level directory with > 10 files, read the file matching
|
||||
`index.*`, `main.*`, or `__init__.*` to understand its stated responsibility.
|
||||
|
||||
**Stop when you can answer all three:**
|
||||
- What are the top-level modules (names and count)?
|
||||
- Which modules import from which other modules?
|
||||
- Which module has the highest fan-in or fan-out?
|
||||
|
||||
If the project has > 100 top-level files or > 4 levels of nesting, note which areas were
|
||||
sampled vs. inferred, and flag this in the report scope line.
|
||||
|
||||
### Step 1: Draw the Module Dependency Graph (Mermaid)
|
||||
|
||||
Before evaluating any risk, map the dependencies as a Mermaid diagram. Use this format:
|
||||
|
||||
````mermaid
|
||||
graph TD
|
||||
subgraph UI
|
||||
WebApp
|
||||
MobileApp
|
||||
end
|
||||
|
||||
subgraph Domain
|
||||
AuthService
|
||||
OrderService
|
||||
PaymentService
|
||||
end
|
||||
|
||||
subgraph Infrastructure
|
||||
Database
|
||||
MessageQueue
|
||||
end
|
||||
|
||||
WebApp --> AuthService
|
||||
WebApp --> OrderService
|
||||
MobileApp --> AuthService
|
||||
MobileApp --> OrderService
|
||||
OrderService --> PaymentService
|
||||
OrderService --> Database
|
||||
OrderService --> MessageQueue
|
||||
PaymentService --> Database
|
||||
AuthService -.->|circular| OrderService
|
||||
|
||||
classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff
|
||||
classDef warning fill:#ffd43b,stroke:#e67700
|
||||
classDef clean fill:#51cf66,stroke:#2b8a3e,color:#fff
|
||||
|
||||
class PaymentService critical
|
||||
class OrderService warning
|
||||
class Database,MessageQueue,AuthService,WebApp,MobileApp clean
|
||||
````
|
||||
|
||||
Draw the graph structure first — nodes, subgraphs, and edges — without any `classDef` or
|
||||
`class` lines. You cannot assign colors until you have completed the risk scan in Steps 2–4.
|
||||
|
||||
**After completing Step 4**, return to this graph and add the `classDef` and `class` lines
|
||||
based on findings. The example above shows the final colored output.
|
||||
|
||||
Rules:
|
||||
1. **Nodes** — Use top-level directories or services as nodes, not individual files
|
||||
2. **Grouping** — One `subgraph` per architectural layer or top-level directory (e.g., UI, Domain, Infrastructure)
|
||||
3. **Edges** — Solid arrows (`-->`) point FROM the depending module TO the dependency; use dotted arrows with label (`-.->|circular|`) for circular dependencies. If no circular dependencies exist, use only solid arrows
|
||||
4. **Node limit** — Keep the graph to ~50 nodes maximum; collapse low-risk leaf modules into their parent if needed
|
||||
5. **Fan-out** — For any node with fan-out > 5, use a descriptive label: `HighFanOutModule["ModuleName (fan-out: 7)"]`
|
||||
6. **Colors** — Apply `classDef` colors AFTER completing Steps 2-4: `critical` (red `#ff6b6b`) for nodes with Critical findings, `warning` (yellow `#ffd43b`) for Warning findings, `clean` (green `#51cf66`) for nodes with no findings or only Suggestions. If no findings at all, classify all nodes as `clean`
|
||||
7. **Direction** — Default to `graph TD` (top-down); use `graph LR` only if the architecture is clearly a left-to-right pipeline
|
||||
|
||||
### Step 2: Scan for Dependency Disorder
|
||||
|
||||
*The most architecturally consequential risk — scan this first.*
|
||||
|
||||
Look for:
|
||||
- Circular dependencies (any `-.->|circular|` edge in the map above)
|
||||
- Arrows flowing upward (high-level domain depending on low-level infrastructure)
|
||||
- Stable, widely-depended-on modules that import from frequently-changing modules
|
||||
- Modules with fan-out > 5
|
||||
- Absence of a clear layering rule (no consistent answer to "what depends on what?")
|
||||
|
||||
### Step 3: Scan for Domain Model Distortion
|
||||
|
||||
Look for:
|
||||
- Do module names match the business domain vocabulary?
|
||||
- Is there a layer called "services" that contains all the business logic while domain objects
|
||||
are pure data structures?
|
||||
- Are there modules that cross bounded context boundaries (e.g., billing logic in the user module)?
|
||||
- Is there an anti-corruption layer where external systems interface with the domain?
|
||||
|
||||
### Step 4: Scan for Remaining Four Risks
|
||||
|
||||
Check each in turn:
|
||||
|
||||
**Knowledge Duplication:**
|
||||
- Are there multiple modules implementing the same concept independently?
|
||||
- Does the same domain concept appear under different names in different modules?
|
||||
|
||||
**Accidental Complexity:**
|
||||
- Are there entire layers in the architecture that do not add value?
|
||||
- Are there modules whose responsibility cannot be stated in one sentence?
|
||||
|
||||
**Change Propagation:**
|
||||
- Which modules are "blast radius hotspots"? (A change here requires changes in many other modules)
|
||||
- Does the dependency map reveal why certain features are slow to develop?
|
||||
|
||||
**Cognitive Overload:**
|
||||
- Can the module responsibility of each module be stated in one sentence from its name alone?
|
||||
- Would a new developer know which module to add a new feature to?
|
||||
|
||||
### Step 5: Testability Seam Assessment
|
||||
|
||||
A *seam* is a place in the architecture where behavior can be altered without editing source
|
||||
code — typically an interface, a configuration point, or a dependency injection boundary.
|
||||
Seam density is a proxy for testability and evolvability.
|
||||
|
||||
Scan for:
|
||||
- **No seam at the infrastructure boundary**: can you replace a real database, file system,
|
||||
or HTTP client with a test double without editing the module under test? If not, the
|
||||
architecture forces integration tests where unit tests would suffice.
|
||||
- **Seam collapse**: a module that was once testable in isolation has had its seams removed
|
||||
(e.g., direct constructor instantiation replaced a dependency injection point, or a global
|
||||
singleton replaced an injected collaborator).
|
||||
- **Missing seam in legacy areas**: modules without an obvious injection point or interface
|
||||
boundary — any change requires touching the entire call stack to substitute behavior.
|
||||
|
||||
If all modules have clear seams at their infrastructure boundaries → no finding.
|
||||
|
||||
If seams are absent or collapsed: flag as 🟡 Warning with a Remedy pointing to the specific
|
||||
module and the injection point that needs to be restored or introduced.
|
||||
|
||||
Source: Feathers — Working Effectively with Legacy Code, Ch. 4: The Seam Model
|
||||
|
||||
### Step 6: Conway's Law Check
|
||||
|
||||
After the six-risk scan, assess the relationship between architecture and team structure:
|
||||
|
||||
- Does the module/service structure reflect the team structure?
|
||||
(Conway's Law: "Organizations design systems that mirror their communication structure")
|
||||
- If yes: is this intentional design or accidental coupling?
|
||||
- A mismatch that causes cross-team coordination overhead for every feature is 🔴 Critical.
|
||||
- A mismatch that is theoretical but not yet causing pain is 🟡 Warning.
|
||||
- If team structure is unknown, note this as context missing and skip the check.
|
||||
|
||||
**Calibration examples:**
|
||||
- 🔴 Critical: the Payments module is owned by Team A but contains auth logic owned by Team B —
|
||||
every Payments change requires a sync meeting with Team B
|
||||
- 🟡 Warning: two separate teams own the `utils/` and `helpers/` directories which do the same
|
||||
things — theoretically painful but not yet causing release coordination issues
|
||||
- Not a finding: a single team owns a monorepo with multiple logical modules — Conway's Law
|
||||
misalignment requires *separate teams* to be meaningful
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
Use the standard Report Template from `../_shared/common.md`. Mode: Architecture Audit.
|
||||
|
||||
Place the Mermaid dependency graph FIRST under "Module Dependency Graph". Reference
|
||||
relevant node names in findings. Add `classDef` color assignments LAST, after all
|
||||
findings are identified.
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
# Codebase Onboarding Guide
|
||||
|
||||
**Purpose:** Produce a newcomer-friendly tour of the codebase. This is NOT a diagnostic
|
||||
report — no Health Score, no Iron Law findings. Focus on explanation and orientation.
|
||||
|
||||
---
|
||||
|
||||
## Process
|
||||
|
||||
### Step 1: Map the Territory
|
||||
|
||||
- Read top-level structure (same as architecture-guide Step 0)
|
||||
- Output: a plain-language overview of what each top-level module does (one sentence each)
|
||||
- Group into layers: "Things users interact with", "Business logic", "Infrastructure"
|
||||
|
||||
### Step 2: Draw the Dependency Map
|
||||
|
||||
Draw the same Mermaid dependency graph as architecture audit Step 1, but color nodes by
|
||||
**recommended reading order** using a DISTINCT palette from the severity palette
|
||||
(which uses red/yellow/green). This avoids confusing "red = danger" with "red = read last":
|
||||
|
||||
- 🔵 Blue (`#339af0`): start here — entry points, core domain
|
||||
- 🟣 Purple (`#9775fa`): read next — supporting modules
|
||||
- ⚪ Gray (`#ced4da`): read last — infrastructure, generated code, utilities
|
||||
|
||||
Add numbered labels: `CoreModule["1. CoreModule"]`
|
||||
|
||||
```
|
||||
classDef start fill:#339af0,color:#fff
|
||||
classDef next fill:#9775fa,color:#fff
|
||||
classDef last fill:#ced4da
|
||||
```
|
||||
|
||||
### Step 3: Highlight Key Conventions
|
||||
|
||||
Identify and document patterns the codebase follows:
|
||||
- Naming conventions (file naming, class naming, variable naming)
|
||||
- Directory organization pattern (feature-based? layer-based? hybrid?)
|
||||
- Error handling pattern (exceptions? result types? error codes?)
|
||||
- Testing convention (co-located? separate directory? naming pattern?)
|
||||
- Dependency injection pattern (if any)
|
||||
|
||||
### Step 4: Mark Danger Zones
|
||||
|
||||
For each module with known complexity or coupling issues, add a brief warning:
|
||||
- "OrderService: high complexity, only modify with full test suite running"
|
||||
- "legacy/: no tests, use Characterization Tests before changing"
|
||||
|
||||
Do NOT use Iron Law format — use plain warnings. This is orientation, not diagnosis.
|
||||
|
||||
### Step 5: Build a Domain Glossary
|
||||
|
||||
Extract 10-15 key domain terms from code (class names, method names, constants) and map
|
||||
them to plain-language definitions. This applies Evans's Ubiquitous Language as documentation.
|
||||
|
||||
### Step 6: Suggest First Tasks
|
||||
|
||||
Based on the dependency map, suggest 2-3 low-risk areas where a new developer could make
|
||||
their first contribution: modules with good test coverage, clear boundaries, low coupling.
|
||||
|
||||
---
|
||||
|
||||
## Output Template
|
||||
|
||||
```
|
||||
# Codebase Tour: [Project Name]
|
||||
|
||||
## Overview
|
||||
[2-3 sentence summary of what the project does and its tech stack]
|
||||
|
||||
## Module Map
|
||||
[Mermaid graph with reading-order colors]
|
||||
|
||||
## Module Guide
|
||||
[One paragraph per top-level module: what it does, what it depends on, key files to read]
|
||||
|
||||
## Conventions
|
||||
[Bullet list of patterns this codebase follows]
|
||||
|
||||
## Danger Zones
|
||||
[Bullet list of areas to be careful with, or "None identified" if the codebase is clean]
|
||||
|
||||
## Domain Glossary
|
||||
| Term | Meaning |
|
||||
|------|---------|
|
||||
|
||||
## Suggested First Tasks
|
||||
[2-3 concrete suggestions for a new developer's first PR]
|
||||
```
|
||||
|
|
@ -0,0 +1,35 @@
|
|||
---
|
||||
name: brooks-debt
|
||||
description: >
|
||||
Tech debt assessment that identifies, classifies, and prioritizes maintainability
|
||||
problems — helping teams build a refactoring roadmap — drawing on twelve classic
|
||||
engineering books.
|
||||
Triggers when: user asks about tech debt, refactoring priorities, what to clean up
|
||||
first, or asks "why is this so hard to change?", "where's the most painful part?",
|
||||
"what should we fix first?", "how do I justify refactoring to management?",
|
||||
"why is our velocity dropping?".
|
||||
Do NOT trigger for: server health checks, HTTP /health endpoints, Kubernetes probes,
|
||||
database health, or application uptime — "health" in those contexts is infrastructure,
|
||||
not code quality. Also not for single-function refactoring questions.
|
||||
---
|
||||
|
||||
# Brooks-Lint — Tech Debt Assessment
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
|
||||
4. Read `debt-guide.md` in this directory for the debt classification framework
|
||||
|
||||
## Process
|
||||
|
||||
**If the user has not described the codebase or pointed to specific areas:** apply Auto
|
||||
Scope Detection from `../_shared/common.md` to determine the assessment scope before proceeding.
|
||||
|
||||
1. Scan for all six decay risks (Step 1 of the guide); list every finding before scoring
|
||||
2. Apply the Pain × Spread priority formula and classify debt intent (Steps 2–3 of the guide)
|
||||
3. Group findings by decay risk (Step 4 of the guide)
|
||||
4. Output using the Report Template from common.md, plus the Debt Summary Table
|
||||
|
||||
**Mode line in report:** `Tech Debt Assessment`
|
||||
|
|
@ -0,0 +1,125 @@
|
|||
# Tech Debt Assessment Guide — Mode 3
|
||||
|
||||
**Purpose:** Identify, classify, and prioritize technical debt across the entire codebase.
|
||||
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Gathering
|
||||
|
||||
If you have insufficient evidence to assess the codebase, ask the user ONE question —
|
||||
choose the single question most relevant to what you already know:
|
||||
|
||||
1. "Which part of the codebase takes the longest to modify for a typical feature?"
|
||||
2. "Which module do developers avoid touching, and why?"
|
||||
3. "Which parts of the system have the fewest tests and the most bugs?"
|
||||
4. "Is there a module that only one person fully understands?"
|
||||
|
||||
After one answer, proceed. Do not ask more than one question.
|
||||
If the user declines or says they don't know, proceed with available evidence and note
|
||||
which areas could not be assessed.
|
||||
|
||||
---
|
||||
|
||||
## Analysis Process
|
||||
|
||||
Work through these four steps in order.
|
||||
|
||||
### Step 1: Full Decay Risk Scan
|
||||
|
||||
Scan for all six decay risks across the entire codebase. List every finding before scoring
|
||||
any of them. This prevents anchoring on early findings and missing systemic patterns.
|
||||
|
||||
For each risk, look for:
|
||||
|
||||
**Cognitive Overload:** Are there widespread naming problems, deeply nested logic, or
|
||||
excessively long functions spread across many modules?
|
||||
|
||||
**Change Propagation:** Which modules cause the most ripple effects when changed?
|
||||
Are there modules that everyone must modify when adding a new feature?
|
||||
|
||||
**Knowledge Duplication:** How many times is the same concept implemented independently?
|
||||
Is the domain vocabulary consistent across the codebase?
|
||||
|
||||
**Accidental Complexity:** Are there architectural layers or abstractions that add no value?
|
||||
Is the infrastructure overhead proportional to the problem being solved?
|
||||
|
||||
**Dependency Disorder:** Are there dependency cycles? Does domain logic depend on infrastructure?
|
||||
Are there modules with no clear layering position?
|
||||
|
||||
**Domain Model Distortion:** Is business logic in the right layer?
|
||||
Do code names match business names? Are domain objects anemic?
|
||||
|
||||
### Step 2: Score Each Finding with Pain × Spread
|
||||
|
||||
After listing all findings, score each one:
|
||||
|
||||
**Pain score (1–3):** How much does this slow down development today?
|
||||
- 3: Developers actively avoid touching this area; it causes bugs on most changes
|
||||
*(e.g., "nobody wants to touch the billing module because it always breaks something")*
|
||||
- 2: This area is noticeably slower to work in than the rest of the codebase
|
||||
*(e.g., "adding a field takes 2–3x longer here than elsewhere")*
|
||||
- 1: This is a quality issue but not currently causing active pain
|
||||
*(e.g., "inconsistent naming, but we always know what we mean")*
|
||||
|
||||
**Spread score (1–3):** How many files, modules, or developers does this affect?
|
||||
- 3: Affects 5+ modules or all developers on the team
|
||||
*(e.g., "every new feature touches the God class in core/")*
|
||||
- 2: Affects 2–4 modules or a subset of the team
|
||||
*(e.g., "the auth and notification modules are tightly coupled")*
|
||||
- 1: Isolated to one module or one developer's area
|
||||
*(e.g., "legacy parser that only one person maintains")*
|
||||
|
||||
**Priority = Pain × Spread** (max 9)
|
||||
|
||||
| Priority | Classification | Action |
|
||||
|----------|---------------|--------|
|
||||
| 7–9 | Critical debt | Address in next sprint |
|
||||
| 4–6 | Scheduled debt | Plan within quarter |
|
||||
| 1–3 | Monitored debt | Log and watch |
|
||||
|
||||
### Step 3: Classify Debt Intent
|
||||
|
||||
After scoring, classify each finding as intentional or accidental:
|
||||
|
||||
**Intentional debt** — a conscious shortcut taken to meet a deadline, with the expectation
|
||||
of paying it back. The team knows about it. It may be legitimate (a strategic prototype,
|
||||
a known temporary workaround during a migration).
|
||||
|
||||
**Accidental debt** — degradation that accumulated without a deliberate decision: the team
|
||||
did not choose it and may not even know it exists. This is the kind Ward Cunningham's
|
||||
original definition warned against — not a tactical trade-off, but structural erosion.
|
||||
|
||||
Mark each finding with `[intentional]` or `[accidental]` in the Debt Summary Table.
|
||||
Intentional debt with no visible payback plan — no linked ticket, no code comment, no
|
||||
documented decision — should be treated as accidental for prioritization purposes.
|
||||
Focus remediation energy on accidental debt first; intentional debt at least has an owner.
|
||||
|
||||
### Step 4: Group by Decay Risk
|
||||
|
||||
Report findings grouped by risk type, not by file or module.
|
||||
Grouping by risk reveals systemic patterns:
|
||||
- "Change Propagation is systemic" → architectural intervention needed
|
||||
- "Cognitive Overload is isolated" → localized refactoring sufficient
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
Use the standard Report Template from `../_shared/common.md`. Mode: Tech Debt Assessment.
|
||||
|
||||
After Findings, append a Debt Summary Table:
|
||||
|
||||
```
|
||||
## Debt Summary
|
||||
| Risk | Findings | Avg Priority | Classification | Intent |
|
||||
|------|----------|-------------|----------------|--------|
|
||||
| Cognitive Overload | N | X.X | Monitored/Scheduled/Critical | intentional/accidental |
|
||||
| Change Propagation | N | X.X | ... | ... |
|
||||
| Knowledge Duplication | N | X.X | ... | ... |
|
||||
| Accidental Complexity | N | X.X | ... | ... |
|
||||
| Dependency Disorder | N | X.X | ... | ... |
|
||||
| Domain Model Distortion | N | X.X | ... | ... |
|
||||
|
||||
**Recommended focus:** [risks with highest average priority]
|
||||
```
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
name: brooks-health
|
||||
description: >
|
||||
Combined codebase health dashboard that scores a project across all four quality
|
||||
dimensions — PR quality, architecture, tech debt, and test quality — in a single
|
||||
pass, drawing on twelve classic engineering books.
|
||||
Triggers when: user wants an overall quality assessment, asks "how healthy is this
|
||||
codebase?", "run all the checks", "give me a big-picture quality report", "I need a
|
||||
health score before the release", "what's the overall state of our code?", or wants
|
||||
to onboard a new team with a quality overview.
|
||||
Do NOT trigger for: server health checks, HTTP health endpoints, Kubernetes
|
||||
liveness/readiness probes, database health, or application uptime. Also do not
|
||||
trigger when the user specifically requests only one dimension — use the
|
||||
corresponding focused skill instead (brooks-review / brooks-audit /
|
||||
brooks-debt / brooks-test).
|
||||
---
|
||||
|
||||
# Brooks-Lint — Health Dashboard
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/decay-risks.md` for production risk symptom definitions
|
||||
4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
|
||||
5. Read `health-guide.md` in this directory for the dashboard orchestration process
|
||||
|
||||
## Process
|
||||
|
||||
**If the user has not specified a project or directory:** apply Auto Scope Detection
|
||||
from `../_shared/common.md` to determine the review scope before proceeding.
|
||||
|
||||
1. Run abbreviated scans across all four dimensions (Step 1 of the guide)
|
||||
2. Compute per-dimension and composite Health Scores with weighting (Step 2 of the guide)
|
||||
3. Output the Health Dashboard using the dashboard report template (Step 3 of the guide)
|
||||
|
||||
**Mode line in report:** `Health Dashboard`
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
# Health Dashboard Guide — Mode 5
|
||||
|
||||
**Purpose:** Produce a cross-dimensional health dashboard for the codebase.
|
||||
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
|
||||
|
||||
---
|
||||
|
||||
## Analysis Process
|
||||
|
||||
### Step 1: Run Lightweight Scan Across Four Dimensions
|
||||
|
||||
For each dimension, run an abbreviated scan using the decay-risks definitions
|
||||
from `_shared/`. Do NOT read the individual mode guide files — use the abbreviated
|
||||
checklists below. Cap each dimension at 3 findings; for Debt: cap at 2 per risk code and 3 across all risk codes.
|
||||
|
||||
**PR dimension (if changes exist):**
|
||||
- Apply Auto Scope Detection (common.md)
|
||||
- Scan for R2 (Change Propagation) and R1 (Cognitive Overload) in the diff
|
||||
|
||||
**Architecture dimension:**
|
||||
- Gather codebase context: read top-level structure, entry points, import statements
|
||||
- Draw a Mermaid dependency graph (follow standard graph rules from common.md)
|
||||
- Scan for R5 (Dependency Disorder): circular deps, upward flows, fan-out > 5
|
||||
- INCLUDE the Mermaid graph in output
|
||||
|
||||
**Debt dimension:**
|
||||
- Scan for all six decay risks (R1-R6) across the codebase
|
||||
- Skip Pain × Spread scoring (use severity tier only)
|
||||
|
||||
**Test dimension:**
|
||||
- Build the Test Suite Map (unit/integration/E2E counts)
|
||||
- Scan for T1 (Test Obscurity) and T2 (Test Brittleness) in test files
|
||||
|
||||
### Step 2: Compute Dashboard Scores
|
||||
|
||||
Each dimension gets its own Health Score (base 100, same deduction rules from common.md).
|
||||
Composite score = weighted average of dimension scores:
|
||||
|
||||
| Dimension | Weight | Rationale |
|
||||
|-----------|--------|-----------|
|
||||
| PR (code quality) | 0.25 | Only applies if changes exist; skip if no diff |
|
||||
| Architecture | 0.30 | Structural issues have highest blast radius |
|
||||
| Debt | 0.25 | Systemic but slower-moving |
|
||||
| Test | 0.20 | Supporting signal |
|
||||
|
||||
If PR dimension is skipped (no changes), redistribute its 0.25 weight proportionally
|
||||
across the remaining three dimensions by dividing each remaining weight by
|
||||
(1 − 0.25) = 0.75. Compute redistribution dynamically — do not hardcode the values.
|
||||
|
||||
**Redistributed weights (PR skipped):**
|
||||
|
||||
| Dimension | Base Weight | Redistributed Weight |
|
||||
|-----------|------------|---------------------|
|
||||
| Architecture | 0.30 | 0.30 / 0.75 = 0.40 |
|
||||
| Debt | 0.25 | 0.25 / 0.75 = 0.33 |
|
||||
| Test | 0.20 | 0.20 / 0.75 = 0.27 |
|
||||
|
||||
### Step 3: Output Dashboard
|
||||
|
||||
Use the dashboard report template below instead of the standard common.md template.
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Report Template
|
||||
|
||||
````markdown
|
||||
# Brooks-Lint Health Dashboard
|
||||
|
||||
**Mode:** Health Dashboard
|
||||
**Scope:** [project name or directory]
|
||||
**Composite Score:** XX/100
|
||||
|
||||
| Dimension | Score | Top Finding |
|
||||
|-----------|-------|------------|
|
||||
| Code Quality | XX/100 | [one-line summary or "Clean"] |
|
||||
| Architecture | XX/100 | [one-line summary or "Clean"] |
|
||||
| Tech Debt | XX/100 | [one-line summary or "Clean"] |
|
||||
| Test Quality | XX/100 | [one-line summary or "Clean"] |
|
||||
|
||||
## Module Dependency Graph
|
||||
[Mermaid graph from architecture scan]
|
||||
|
||||
## Top Findings (max 5 across all dimensions)
|
||||
[Standard Iron Law format, sorted by severity]
|
||||
|
||||
## Recommendation
|
||||
[One paragraph: what to fix first, which dimension needs the most attention,
|
||||
suggest running the full individual skill for the worst dimension]
|
||||
````
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
name: brooks-review
|
||||
description: >
|
||||
PR code review that surfaces decay risks, design smells, and maintainability
|
||||
issues with concrete Symptom → Source → Consequence → Remedy findings, drawing
|
||||
on twelve classic engineering books.
|
||||
Triggers when: user asks to review code, check a PR, shares a diff or pastes
|
||||
code asking "does this look right?" / "any issues here?" / "ready to merge?",
|
||||
or asks for feedback on a function, class, or file.
|
||||
Also triggers when user mentions: code smells / refactoring / clean architecture /
|
||||
DDD / domain-driven design / SOLID principles / Hyrum's Law / deep modules /
|
||||
tactical programming / conceptual integrity / Brooks's Law / Mythical Man-Month /
|
||||
second system effect.
|
||||
Do NOT trigger for: questions about how to write code from scratch, language syntax
|
||||
questions, or framework/tool questions where no existing code is shared.
|
||||
---
|
||||
|
||||
# Brooks-Lint — PR Review
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions
|
||||
4. Read `pr-review-guide.md` in this directory for the analysis process
|
||||
|
||||
## Process
|
||||
|
||||
**If the user has not specified files or pasted code:** apply Auto Scope Detection
|
||||
from `../_shared/common.md` to determine the review scope before proceeding.
|
||||
|
||||
1. Understand the review scope, then scan for each decay risk in the order specified (Steps 1–6 of the guide)
|
||||
2. Run the Quick Test Check (Step 7 of the guide) — skip for docs-only or non-production changes
|
||||
3. Apply the Iron Law to every finding
|
||||
4. Output using the Report Template from common.md
|
||||
|
||||
**Mode line in report:** `PR Review`
|
||||
|
|
@ -0,0 +1,163 @@
|
|||
# PR Review Guide — Mode 1
|
||||
|
||||
**Purpose:** Analyze a code diff or specific files for decay risks that are directly visible
|
||||
in the changed code. Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
|
||||
|
||||
---
|
||||
|
||||
## Before You Start
|
||||
|
||||
**Auto-generated files:** If the diff contains generated files (protobuf stubs, OpenAPI clients,
|
||||
ORM migrations, lock files, minified bundles), skip those files entirely. Generated code reflects
|
||||
tool choices, not developer decisions. Note in the report which files were skipped and why.
|
||||
|
||||
**Scope calibration:** Adjust analysis depth based on PR size before starting.
|
||||
|
||||
| PR Size | Approach |
|
||||
|---------|----------|
|
||||
| < 50 lines | Focus on Steps 1–3 only; run Step 6a only if imports changed; run Step 6b if any class, method, or variable was renamed or introduced |
|
||||
| 50–300 lines | Full process, all steps |
|
||||
| > 300 lines | Full process; note in the Scope line that review is sampled — cover the highest-risk areas rather than every file |
|
||||
|
||||
For PRs > 500 lines: flag in the Summary that a PR this size is itself a Change Propagation signal. A change that cannot be reviewed in one pass suggests tangled responsibilities.
|
||||
|
||||
---
|
||||
|
||||
## Analysis Process
|
||||
|
||||
Work through these seven steps in order. Do not skip steps.
|
||||
|
||||
### Step 1: Understand the scope
|
||||
|
||||
Read the diff or files and answer:
|
||||
- What is the stated purpose of this change?
|
||||
- Which files were modified?
|
||||
- Flag immediately if the PR changes more than 10 unrelated files — that itself is a
|
||||
🟡 Warning: Change Propagation (a PR that touches many unrelated things is a sign
|
||||
that responsibilities are tangled).
|
||||
|
||||
### Step 2: Scan for Change Propagation
|
||||
|
||||
*Scan this first — it is the most visible risk in a diff.*
|
||||
|
||||
Look for:
|
||||
- Does this change touch files in modules that have no conceptual connection to the stated purpose?
|
||||
- Does any modified class change for more than one business reason in this diff?
|
||||
- Does any method use more data from another class than from its own?
|
||||
|
||||
If the diff shows no cross-module changes beyond what the feature requires → skip, no finding.
|
||||
|
||||
### Step 3: Scan for Cognitive Overload
|
||||
|
||||
Look for:
|
||||
- Are any new or modified functions longer than 20 lines?
|
||||
- Is there nesting deeper than 3 levels in new or modified code?
|
||||
- Are there more than 4 parameters in any new function signature?
|
||||
- Are there magic numbers or unexplained constants in new code?
|
||||
- Do new variable or function names require reading the implementation to understand?
|
||||
- Are there train-wreck chains (3+ method calls chained)?
|
||||
|
||||
### Step 4: Scan for Knowledge Duplication
|
||||
|
||||
Look for:
|
||||
- Does this change introduce logic that already exists elsewhere in the codebase?
|
||||
- Does this change introduce a new name for a concept that already has a name?
|
||||
- Does this change add a class to a hierarchy that has a parallel in another module?
|
||||
|
||||
### Step 5: Scan for Accidental Complexity
|
||||
|
||||
Look for:
|
||||
- Does this change add an abstraction with only one concrete use?
|
||||
- Does this change add a class that only wraps another class or delegates everything?
|
||||
- Does this change add configuration options or extension points that serve no current requirement?
|
||||
|
||||
### Step 6a: Scan for Dependency Disorder
|
||||
|
||||
- Do any new imports create a dependency from a high-level module to a low-level one?
|
||||
(e.g., domain service now imports a database driver or HTTP client)
|
||||
- Do any new imports introduce a cycle between modules?
|
||||
- Does any new interface force callers to depend on methods they do not use?
|
||||
|
||||
If no new imports and no structural changes → skip, no finding.
|
||||
|
||||
### Step 6b: Scan for Domain Model Distortion
|
||||
|
||||
- Do new class or variable names match the language the business uses for the same concept?
|
||||
- Does any new class hold only data with no behavior (pure data bag), where behavior was expected?
|
||||
- Does any new method put logic that belongs to the domain in a service or utility layer?
|
||||
|
||||
---
|
||||
|
||||
## Severity Calibration
|
||||
|
||||
Apply the Iron Law format from `../_shared/common.md`. Each risk in `../_shared/decay-risks.md` has its own Severity
|
||||
Guide with numeric thresholds — use those as the primary reference. When a finding sits
|
||||
on the boundary between two tiers, use this as a tiebreaker:
|
||||
- 🔴 Critical — actively breaking velocity or creating production risk *today*
|
||||
- 🟡 Warning — will if left unaddressed through the next few features
|
||||
- 🟢 Suggestion — worth fixing when nearby, not urgent
|
||||
|
||||
When multiple findings exist, list Critical items first. If there are more than 5 findings,
|
||||
add a one-line "Recommended fix order" at the end of the Findings section.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Quick Test Check
|
||||
|
||||
*Run this last. Three signals only — this is not a full Mode 4 review.*
|
||||
|
||||
If the diff contains only generated files, configuration, or documentation with no
|
||||
production logic changes → skip Step 7 entirely.
|
||||
|
||||
**Signal 1: Do tests exist for the changed behavior?**
|
||||
|
||||
- Does the diff modify production code?
|
||||
- Are corresponding test file changes included in the diff?
|
||||
- If new public behavior was added with no new tests:
|
||||
→ 🟡 Warning: Coverage Illusion — new behavior is untested
|
||||
→ Source: Feathers — Working Effectively with Legacy Code, Ch. 1
|
||||
- If the change is a pure refactor and existing tests cover the behavior → no finding.
|
||||
|
||||
**Signal 2: Quick Mock Abuse sniff**
|
||||
|
||||
Only check if the diff includes test file changes.
|
||||
|
||||
- Is mock setup code in new/modified tests obviously longer than the test logic?
|
||||
- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` with no behavior verification?
|
||||
- Does the diff add any methods to production classes that are only called from test files?
|
||||
|
||||
If any of these are true:
|
||||
→ 🟡 Warning: Mock Abuse — test complexity exceeds behavior complexity
|
||||
→ Source: Osherove — The Art of Unit Testing, mock usage guidelines
|
||||
|
||||
**Signal 3: Quick Test Obscurity sniff**
|
||||
|
||||
Only check if the diff includes test file changes.
|
||||
|
||||
- Do new test names express scenario and expected outcome?
|
||||
(Pattern: `methodName_scenario_expectedResult` or equivalent)
|
||||
- Are there new tests with multiple assertions and no message strings on any of them?
|
||||
|
||||
If test names are vague or assertions lack messages:
|
||||
→ 🟢 Suggestion: Test Obscurity — test intent is unclear from the test name or assertions
|
||||
→ Source: Meszaros — xUnit Test Patterns, Assertion Roulette (p.224)
|
||||
|
||||
**Output rule:**
|
||||
|
||||
If all three signals are clean → write no Test findings. Proceed directly to the report.
|
||||
|
||||
If findings exist → add them to the Findings section using the standard Iron Law format.
|
||||
Label the risk as the test decay risk name (e.g., "Coverage Illusion", "Mock Abuse",
|
||||
"Test Obscurity").
|
||||
|
||||
> **Note:** Step 7 is a fast check, not a full test audit. When systemic test problems
|
||||
> are found, note in the Summary: "Consider running `/brooks-lint:brooks-test` for a
|
||||
> complete test quality diagnosis."
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
Use the standard Report Template from `../_shared/common.md`.
|
||||
Mode: PR Review
|
||||
Scope: list the files reviewed (excluding skipped generated files).
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
---
|
||||
name: brooks-sweep
|
||||
description: >
|
||||
Full-sweep mode: runs a unified analysis across all quality dimensions — code decay,
|
||||
architecture, tech debt, and test quality — then applies fixes directly to the
|
||||
codebase. Safe changes are auto-applied; risky changes are confirmed before
|
||||
execution. Drawing on twelve classic engineering books.
|
||||
Triggers when: user wants to "fix everything", "sweep the codebase", "auto-fix all
|
||||
issues", "run all checks and fix them", "clean up the whole project", or asks for
|
||||
a single command that both diagnoses and remediates quality problems.
|
||||
Do NOT trigger for: read-only audits or health reports where the user only wants
|
||||
findings without code changes; single-dimension reviews (use the focused skill
|
||||
instead: brooks-review / brooks-audit / brooks-debt / brooks-test); server health
|
||||
checks, HTTP /health endpoints, Kubernetes probes, or application uptime.
|
||||
---
|
||||
|
||||
# Brooks-Lint — Full Sweep & Auto-Fix
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/decay-risks.md` for production risk symptom definitions
|
||||
4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions
|
||||
5. Read `sweep-guide.md` in this directory for the unified scan and fix process
|
||||
|
||||
## Process
|
||||
|
||||
**If the user has not specified a project or directory:** apply Auto Scope Detection
|
||||
from `../_shared/common.md` to determine the review scope before proceeding.
|
||||
|
||||
1. Show pre-flight consent notice and wait for the user's one-time approval (Step 0 of the guide)
|
||||
2. Enumerate scope and initialize the `unresolvable` / `non_critical_rounds` / `fix_log` state (Step 1 of the guide)
|
||||
3. Run the four dimensions in sequence — review, test, debt, audit — each scanning, classifying, applying Safe + Extended-Safe fixes, and verifying via the project test command (Steps 2–5 of the guide)
|
||||
4. Iterate: re-scan modified files + same-module + static consumers; converge on a clean round, retire 3-retry failures to the `unresolvable` set, cap non-critical rounds at 3 (Step 6 of the guide)
|
||||
5. Aggregate residual and unresolvable items and output the Full Sweep Report (Steps 7–8 of the guide)
|
||||
|
||||
**Mode line in report:** `Full Sweep`
|
||||
|
|
@ -0,0 +1,264 @@
|
|||
# Brooks-Lint — Full Sweep Guide
|
||||
|
||||
Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings
|
||||
in place, iterates until clean or capped, reports residuals. One interaction point:
|
||||
Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8.
|
||||
|
||||
Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**.
|
||||
|
||||
---
|
||||
|
||||
### Step 0 — Pre-flight consent gate
|
||||
|
||||
**Goal:** State scope, cost, and irreversibility up front; get explicit consent
|
||||
once so later steps never have to ask.
|
||||
|
||||
0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or
|
||||
`find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough.
|
||||
|
||||
0b. Show this notice verbatim with the estimate filled in. Do not paraphrase —
|
||||
the user is agreeing to this exact scope.
|
||||
|
||||
```
|
||||
⚠️ /brooks-sweep — Full Repository Sweep & Auto-Fix
|
||||
|
||||
Scope: Four analysis dimensions run in sequence — PR code decay (R1–R6),
|
||||
test quality (T1–T6), tech debt, architecture. Edits are made in
|
||||
place inside the detected project scope.
|
||||
Estimated files in scope: ~N
|
||||
|
||||
Order: brooks-review → brooks-test → brooks-debt → brooks-audit.
|
||||
Each dimension scans, queues, and fixes before the next starts.
|
||||
|
||||
Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file
|
||||
fixes that have test coverage AND do not break a public interface
|
||||
also apply directly. High-risk fixes (public API break, cross-module
|
||||
structural change, or no test coverage) are NOT applied — they are
|
||||
recorded in the residual report for human review.
|
||||
|
||||
Iteration: After each dimension pass, modified files + same-module + static
|
||||
consumers are re-scanned. A finding that fails to fix 3 times is
|
||||
retired to the unresolvable set and never re-queued. Non-critical
|
||||
rounds cap at 3 iterations; critical findings iterate until
|
||||
resolved or retired.
|
||||
|
||||
Git impact: The pipeline edits files. It does NOT commit, push, or amend.
|
||||
If you have uncommitted work you want to preserve, commit or stash
|
||||
first.
|
||||
|
||||
Proceed with full autonomous sweep? [Y/n]
|
||||
```
|
||||
|
||||
0c. Parse the reply (first match wins, evaluate rules in order):
|
||||
1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified."
|
||||
2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1.
|
||||
3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1.
|
||||
4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification."
|
||||
|
||||
0d. After consent, do not ask further questions until Step 8.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 — Scope enumeration and state init
|
||||
|
||||
1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not
|
||||
specify files or a directory. Otherwise honor the user's explicit scope.
|
||||
|
||||
1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`,
|
||||
`severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the
|
||||
applied config values and reuse them across all iteration rounds — do not
|
||||
re-read the file in Step 6 even if files were modified.
|
||||
|
||||
1c. Initialize pipeline state (persists across all rounds):
|
||||
|
||||
- **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued.
|
||||
- **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round.
|
||||
- **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`).
|
||||
|
||||
1d. Record the final scope file list in the Fix Report output buffer for Step 8.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — brooks-review pass (R1–R6 code decay)
|
||||
|
||||
Scan every file in scope against all R-series risks defined in
|
||||
`../_shared/decay-risks.md`.
|
||||
|
||||
2a. For each R-risk, apply its symptom checklist. Record each hit as a finding
|
||||
with: risk code, file + approximate line range, Symptom, Source,
|
||||
Consequence, Remedy, Severity (Critical / Warning / Suggestion), and
|
||||
**Fix-Class** (see Step 2b).
|
||||
|
||||
2b. Assign Fix-Class per finding:
|
||||
|
||||
| Class | Criteria |
|
||||
|-------|----------|
|
||||
| **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. |
|
||||
| **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. |
|
||||
| **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. |
|
||||
|
||||
2c. Skip any finding that matches an entry in the `unresolvable` set.
|
||||
|
||||
2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk
|
||||
within each severity tier first. For each fix: Edit or Write, then append
|
||||
one row to `fix_log` with outcome `applied`. If two fixes touch overlapping
|
||||
line ranges in the same file, apply higher-severity first, re-read the file,
|
||||
then apply the next.
|
||||
|
||||
2e. After all fixes in this dimension, run the project test/lint command if one
|
||||
exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.).
|
||||
If tests fail: revert fixes from this dimension in reverse order one at a
|
||||
time, re-running the test command after each revert, until tests pass.
|
||||
Mark each reverted fix with outcome `reverted` in `fix_log` and promote the
|
||||
finding to **Residual**. If no test command is found, note this once in the
|
||||
report and continue.
|
||||
|
||||
2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied,
|
||||
R reverted, P Residual.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — brooks-test pass (T1–T6 test decay)
|
||||
|
||||
Scan test files (and untested production code) against T-series risks defined
|
||||
in `../_shared/test-decay-risks.md`.
|
||||
|
||||
Follow the same sub-steps as Step 2 (classify → apply → verify → summarize),
|
||||
using T-prefix risk codes. For production files with no test coverage at all,
|
||||
record as T2 (Missing Tests). A test scaffold that adds a pure-function test is
|
||||
**Safe**; adding tests that require new test infrastructure is **Residual**.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — brooks-debt pass (tech debt accumulation)
|
||||
|
||||
Re-classify R-findings through a debt lens — same symptoms at accumulation scale:
|
||||
repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead
|
||||
flags. Score each with **Pain (1–3) × Spread (1–3)**; total 7–9 = Critical,
|
||||
4–6 = Warning, 1–3 = Suggestion. Apply a severity bump for pattern-level
|
||||
occurrences (isolated Suggestion → 4+ modules Warning).
|
||||
|
||||
Follow the same sub-steps as Step 2. Debt findings often span multiple files
|
||||
and are more likely to land in Extended-Safe or Residual than Safe.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — brooks-audit pass (architecture integrity)
|
||||
|
||||
Scan the full scope for architecture-level issues. The dependency-direction
|
||||
symptoms (inverted dependencies, circular imports, cross-domain coupling) are
|
||||
defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5
|
||||
additionally covers architecture-only concerns that R5 does not: missing
|
||||
abstraction layers, god modules, leaked infrastructure inside domain code,
|
||||
and seam-boundary violations.
|
||||
|
||||
Most architecture findings are **Residual** by definition — they require human
|
||||
judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared
|
||||
constant used in 3+ modules into a new module that nothing else imports yet).
|
||||
Do not auto-refactor module layouts, rename packages, or change public exports.
|
||||
|
||||
Follow the same sub-steps as Step 2.
|
||||
|
||||
---
|
||||
|
||||
### Step 6 — Iteration loop
|
||||
|
||||
**Goal:** Re-scan what the fixes touched and converge. Stop on clean round,
|
||||
cap, or no progress.
|
||||
|
||||
6a. Build the re-scan scope:
|
||||
- every file modified in Steps 2–5 of the current round, PLUS
|
||||
- every file in the same module as a modified file, PLUS
|
||||
- every file that statically imports from a modified file.
|
||||
|
||||
Do not re-scan files whose dependencies were not touched. On monorepos
|
||||
where a "module" may span hundreds of files, narrow the same-module bucket
|
||||
to files that import from or are imported by a modified file (direct
|
||||
dependency graph only).
|
||||
|
||||
6b. Re-run Steps 2–5 on the re-scan scope. For each new finding in this round:
|
||||
- If it matches an entry in `unresolvable` → skip.
|
||||
- Else if 🔴 Critical → queue and fix; Critical findings iterate until
|
||||
resolved OR retired (3 failed attempts → `unresolvable`).
|
||||
- Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below.
|
||||
|
||||
6c. Classify the round after all fixes attempted:
|
||||
- **Clean round** (no new findings outside `unresolvable`): pipeline
|
||||
converged → proceed to Step 7.
|
||||
- **Critical-only round**: do NOT increment `non_critical_rounds`; return
|
||||
to 6a.
|
||||
- **Mixed or non-critical round** (any Warning / Suggestion produced):
|
||||
increment `non_critical_rounds` by 1. If it reaches the cap (default 3,
|
||||
or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7
|
||||
with remaining non-critical findings recorded as
|
||||
`"Unresolved — iteration cap reached"`. Otherwise return to 6a.
|
||||
|
||||
6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times
|
||||
across any combination of rounds, retire it to `unresolvable` with reason
|
||||
`"3-retry budget exhausted"` and stop attempting it.
|
||||
|
||||
---
|
||||
|
||||
### Step 7 — Residual aggregation
|
||||
|
||||
Collect everything that was NOT fixed in place, de-duplicated:
|
||||
|
||||
- All Residual-class findings from Steps 2–5 (first round + re-scan rounds)
|
||||
- All `unresolvable` entries with their retirement reason
|
||||
- All iteration-cap residuals from Step 6c
|
||||
|
||||
Sort Critical → Warning → Suggestion. Within each severity, list file path,
|
||||
risk code, Symptom (one line), Remedy (one line), and the reason it was not
|
||||
applied (`public API break` / `no test coverage` / `3-retry budget` /
|
||||
`iteration cap`).
|
||||
|
||||
---
|
||||
|
||||
### Step 8 — Sweep report
|
||||
|
||||
Output the final report. Use the standard Report Template from
|
||||
`../_shared/common.md` with these additions:
|
||||
|
||||
```
|
||||
# Brooks-Lint — Full Sweep Report
|
||||
Mode: Full Sweep | Scope: <files or directory>
|
||||
Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored) # omit if no config
|
||||
|
||||
## Dimension Summary
|
||||
| Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual |
|
||||
|-----------|---------|--------------|------------------|----------|----------|
|
||||
| Review (R1–R6) | ... | ... | ... | ... | ... |
|
||||
| Test (T1–T6) | ... | ... | ... | ... | ... |
|
||||
| Debt | ... | ... | ... | ... | ... |
|
||||
| Audit | ... | ... | ... | ... | ... |
|
||||
|
||||
## Iteration History
|
||||
Round 1: <classification — clean / critical-only / mixed>, <N> new findings
|
||||
Round 2: ...
|
||||
Stopped at: clean round | iteration cap | no outstanding criticals
|
||||
|
||||
## Fix Log
|
||||
| # | File | Lines | Risk | Outcome | Change |
|
||||
|---|------|-------|------|----------|--------|
|
||||
| 1 | ... | ... | R2 | applied | Extract repeated constant |
|
||||
| 2 | ... | ... | T4 | reverted | Test regression; promoted to Residual |
|
||||
...
|
||||
|
||||
## Health Score Delta
|
||||
Before: <estimated score>/100 → After: <estimated score>/100
|
||||
(Re-run /brooks-health for an exact recalculation.)
|
||||
|
||||
## Residual Items (<K> not applied)
|
||||
<Iron Law entries, sorted Critical → Suggestion, with "Not applied because: ..." line>
|
||||
|
||||
## Summary
|
||||
- Total findings detected: <N>
|
||||
- Fixed this sweep: <M>
|
||||
- Residual (needs human review): <K>
|
||||
- Unresolvable (3-retry exhausted): <U>
|
||||
```
|
||||
|
||||
If there are zero residual items and zero unresolvable entries, end with:
|
||||
**"Sweep complete — codebase is clean."**
|
||||
|
||||
**Mode line in report:** `Full Sweep`
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
---
|
||||
name: brooks-test
|
||||
description: >
|
||||
Test quality review drawing on twelve classic engineering books — with primary focus
|
||||
on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and
|
||||
Working Effectively with Legacy Code — that diagnoses structural problems in an
|
||||
existing test suite: brittleness, mock abuse, coverage illusions, slow execution,
|
||||
poor readability.
|
||||
Triggers when: user asks about test quality, shares test files for review, or
|
||||
expresses frustration: "tests keep breaking whenever I change anything", "our tests
|
||||
take forever", "I can't understand what this test is doing", "tests pass but bugs
|
||||
still reach production", "we have too many mocks".
|
||||
Do NOT trigger for: writing new tests from scratch (use the regular test-writing
|
||||
workflow) or testing framework/syntax questions — this skill reviews an existing
|
||||
suite for structural quality problems, not individual test authoring.
|
||||
---
|
||||
|
||||
# Brooks-Lint — Test Quality Review
|
||||
|
||||
## Setup
|
||||
|
||||
1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
|
||||
2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
|
||||
3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions
|
||||
4. Read `test-guide.md` in this directory for the test quality review framework
|
||||
|
||||
## Process
|
||||
|
||||
**If the user has not shared test files or pointed to a test directory:** apply Auto
|
||||
Scope Detection from `../_shared/common.md` to determine the review scope before proceeding.
|
||||
|
||||
1. Build the test suite map (guide's "Before You Start" section)
|
||||
2. Scan for each test decay risk in the order specified (Steps 1–4 of the guide)
|
||||
3. Apply the Iron Law and output using the Report Template (Step 5 of the guide)
|
||||
|
||||
**Mode line in report:** `Test Quality Review`
|
||||
|
|
@ -0,0 +1,147 @@
|
|||
# Test Quality Review Guide — Mode 4
|
||||
|
||||
**Purpose:** Diagnose the health of a test suite using six test-space decay risks.
|
||||
Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
|
||||
|
||||
---
|
||||
|
||||
## Before You Start: Build the Test Suite Map
|
||||
|
||||
Before scanning for any risk, map the current test suite structure:
|
||||
|
||||
```
|
||||
Unit tests: X files, ~N tests
|
||||
Integration tests: X files, ~N tests
|
||||
E2E tests: X files, ~N tests
|
||||
Ratio: Unit X% : Integration X% : E2E X%
|
||||
Coverage areas: [modules with tests] vs [modules without tests]
|
||||
```
|
||||
|
||||
If you cannot access test files directly, ask the user **one question** — choose the
|
||||
most relevant:
|
||||
1. "Which module is hardest to test or has the least coverage?"
|
||||
2. "When you make a change, how often do unrelated tests break?"
|
||||
3. "Is there a part of the codebase your team avoids touching because it has no tests?"
|
||||
|
||||
After one answer, proceed. Do not ask more than one question.
|
||||
|
||||
---
|
||||
|
||||
## Analysis Process
|
||||
|
||||
Work through these five steps in order.
|
||||
|
||||
### Step 1: Scan for Test Obscurity
|
||||
|
||||
*Scan this first — the most visible risk and the one that determines whether the suite
|
||||
is maintainable at all.*
|
||||
|
||||
Look for:
|
||||
- Read 5–10 test names at random: can each one communicate subject + scenario + expected
|
||||
outcome without opening the test body?
|
||||
- Are there tests where a failure gives no clue which behavior broke (multiple assertions,
|
||||
no message strings)?
|
||||
- Does any test depend on external state (files, database rows, env variables, shared mutable
|
||||
fixtures) that is invisible from within the test body?
|
||||
- Is there a single massive setUp or beforeEach that every test inherits regardless of
|
||||
what it actually needs?
|
||||
|
||||
If all test names are clear and setups are minimal → no finding.
|
||||
|
||||
### Step 2a: Scan for Test Brittleness
|
||||
|
||||
*Brittle tests break on refactors that do not change observable behavior — they test
|
||||
implementation, not contracts.*
|
||||
|
||||
Look for:
|
||||
- Ask (or check git history): did any recent refactor cause test failures with no
|
||||
behavior change?
|
||||
- Are there test methods where the name contains "and" or that assert on 3 or more
|
||||
unrelated behaviors (Eager Test)?
|
||||
- Do assertions specify mock call order or exact parameter values that are irrelevant
|
||||
to the observable behavior?
|
||||
- Are tests coupled to private methods or internal state directly?
|
||||
|
||||
If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
|
||||
If isolated (1–2 brittle tests) → 🟢 Suggestion.
|
||||
|
||||
### Step 2b: Scan for Mock Abuse
|
||||
|
||||
*Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
|
||||
Scan this separately from brittleness — over-mocking is often the cause of brittleness,
|
||||
but it is a distinct problem worth its own finding.*
|
||||
|
||||
**Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and
|
||||
check brittleness signals and mock-setup ratio in the same pass, then write separate
|
||||
findings if both problems are present.
|
||||
|
||||
Look for:
|
||||
- Is mock setup code longer than the assertion logic in the sampled tests?
|
||||
- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
|
||||
assertions on outputs, state, or observable events?
|
||||
- Are there methods in production classes that are only called from test files
|
||||
(test-induced design damage)?
|
||||
- Does any single test create more than 3 mock objects?
|
||||
|
||||
If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
|
||||
If production methods exist only for test access → 🔴 Critical (architecture is being
|
||||
distorted by the test suite).
|
||||
|
||||
### Step 3: Scan for Test Duplication
|
||||
|
||||
Look for:
|
||||
- Is the same setup block (same variables initialized the same way) repeated across
|
||||
5 or more test files without a shared helper?
|
||||
- Are there multiple tests that pass identical inputs and assert identical outputs
|
||||
with no differentiation (Lazy Test)?
|
||||
- Is the same business scenario covered at unit, integration, and E2E level with no
|
||||
difference in what each layer is testing?
|
||||
|
||||
If duplication is systemic (10 or more instances) → Critical.
|
||||
If localized (3–5 instances) → Warning.
|
||||
|
||||
### Step 4: Scan for Coverage Illusion and Architecture Mismatch
|
||||
|
||||
Look for Coverage Illusion:
|
||||
- Pick the most recently modified core module. Are its error-handling branches and
|
||||
null/boundary inputs covered by tests?
|
||||
- Are there legacy areas (old functions, no test files nearby) that are actively
|
||||
being changed?
|
||||
- Do the tests assert on side effects (DB writes, events emitted, state transitions)
|
||||
or only on return values?
|
||||
|
||||
**Characterization Test check:** If legacy code is being modified without existing tests,
|
||||
the team needs Characterization Tests before making the change — not after. Look for
|
||||
this pattern and flag it when absent.
|
||||
|
||||
A Characterization Test locks in current behavior (right or wrong) so future changes
|
||||
do not silently regress it. Template:
|
||||
```
|
||||
test("characterize: [module].[method] given [input], returns [current output]") {
|
||||
// Call the code under test with realistic inputs
|
||||
// Assert on whatever it currently returns — even if you suspect the output is wrong
|
||||
// Add a comment: "This captures current behavior, not necessarily correct behavior"
|
||||
}
|
||||
```
|
||||
Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests
|
||||
|
||||
Look for Architecture Mismatch:
|
||||
- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
|
||||
- Are high-risk modules tested at higher density than trivial utilities?
|
||||
|
||||
**Test suite performance:** A slow test suite is a first-class maintainability risk — it
|
||||
breaks the fast-feedback loop and causes developers to skip running tests locally.
|
||||
- If the full suite runtime is known and > 10 minutes → 🟡 Warning
|
||||
- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
|
||||
means nobody is running it regularly)
|
||||
- If tests that could be unit tests are integration tests, that is a Performance Mismatch:
|
||||
each misclassified test adds seconds of avoidable wait time
|
||||
|
||||
Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)
|
||||
|
||||
### Step 5: Apply Iron Law, Output Report
|
||||
|
||||
Apply the Iron Law format from `../_shared/common.md` to each finding.
|
||||
|
||||
Use the standard Report Template. Mode: Test Quality Review.
|
||||
Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".
|
||||
|
|
@ -0,0 +1,128 @@
|
|||
---
|
||||
name: codebase-migrate
|
||||
description: Run large codebase migrations and multi-file refactors. Uses the Composio CLI to coordinate issue tracking, batched PRs, and CI verification while the agent executes the transforms locally across hundreds of files.
|
||||
metadata:
|
||||
short-description: Codebase migrations + multi-file refactors
|
||||
---
|
||||
|
||||
# Codebase Migrate
|
||||
|
||||
Coordinate framework upgrades, API renames, config rewrites, and structural refactors across hundreds of files. Local edits are driven by the agent; the [Composio CLI](https://docs.composio.dev/docs/cli) handles the surrounding ceremony: tracking issues, per-batch PRs, and CI verification.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Framework upgrade (React 17 → 19, Node 18 → 22, Django 4 → 5).
|
||||
- API rename across a monorepo (e.g., `getUserById` → `users.byId`).
|
||||
- Config/format migration (webpack → vite, eslint → biome, jest → vitest).
|
||||
- Any "change 200 files the same way" task that needs to ship in reviewable slices.
|
||||
|
||||
## Prereqs
|
||||
|
||||
```bash
|
||||
curl -fsSL https://composio.dev/install | bash
|
||||
composio login
|
||||
composio link github # for PRs + CI status
|
||||
composio link linear # or jira — for migration tracking
|
||||
```
|
||||
|
||||
Local tools the agent will use directly: `git`, `rg`, `jscodeshift`/`ts-morph`/`comby`/`ast-grep` (language-appropriate), and your test runner.
|
||||
|
||||
## Planning Phase
|
||||
|
||||
1. **Define the transform precisely.** Bad: "migrate to vitest." Good: "replace `jest.mock` with `vi.mock`, swap `jest.fn()` for `vi.fn()`, rename `jest.config.js` → `vitest.config.ts` using template X."
|
||||
2. **Scope the blast radius:**
|
||||
```bash
|
||||
rg -l 'jest\.(mock|fn|spyOn)' | wc -l
|
||||
rg -l 'from "jest"' | sort
|
||||
```
|
||||
3. **File a tracking issue:**
|
||||
```bash
|
||||
composio execute LINEAR_CREATE_ISSUE -d '{
|
||||
"teamId":"TEAM_ID",
|
||||
"title":"Migrate test runner: jest → vitest",
|
||||
"description":"Batches of ~25 files. Checkpoint after each PR lands green."
|
||||
}'
|
||||
```
|
||||
|
||||
## Execute in Reviewable Batches
|
||||
|
||||
Loop: pick N files → transform → test → PR → wait for green → merge → next batch.
|
||||
|
||||
```bash
|
||||
# Batch helper: first 25 untouched files matching the pattern
|
||||
BATCH=$(rg -l 'jest\.mock' | grep -v done.list | head -25)
|
||||
echo "$BATCH" > batch.list
|
||||
```
|
||||
|
||||
The agent runs the codemod on `batch.list`, then:
|
||||
|
||||
```bash
|
||||
git checkout -b migrate/vitest-batch-03
|
||||
xargs < batch.list codemod-runner # e.g. jscodeshift / ts-morph / comby
|
||||
npm test -- --changed
|
||||
git add -A && git commit -m "migrate(test): jest → vitest (batch 3)"
|
||||
git push -u origin migrate/vitest-batch-03
|
||||
|
||||
composio execute GITHUB_CREATE_A_PULL_REQUEST -d '{
|
||||
"owner":"acme","repo":"app",
|
||||
"head":"migrate/vitest-batch-03","base":"main",
|
||||
"title":"migrate(test): jest → vitest (batch 3)",
|
||||
"body":"Part of LIN-482. 25 files. Codemod: `transforms/jest-to-vitest.ts`."
|
||||
}'
|
||||
```
|
||||
|
||||
Then poll CI and merge when green:
|
||||
|
||||
```bash
|
||||
composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
|
||||
-d '{"owner":"acme","repo":"app","branch":"migrate/vitest-batch-03"}'
|
||||
```
|
||||
|
||||
## Workflow Script
|
||||
|
||||
`scripts/migrate-batch.ts`, run per batch via `composio run --file scripts/migrate-batch.ts -- --batch 3`:
|
||||
|
||||
```ts
|
||||
const batch = process.argv[process.argv.indexOf("--batch") + 1];
|
||||
|
||||
const pr = await execute("GITHUB_CREATE_A_PULL_REQUEST", {
|
||||
owner: "acme", repo: "app",
|
||||
head: `migrate/vitest-batch-${batch}`, base: "main",
|
||||
title: `migrate(test): jest → vitest (batch ${batch})`,
|
||||
body: `Part of LIN-482. See transforms/jest-to-vitest.ts.`
|
||||
});
|
||||
|
||||
await execute("LINEAR_CREATE_COMMENT", {
|
||||
issueId: "LIN-482",
|
||||
body: `Opened PR #${pr.number}: ${pr.html_url}`
|
||||
});
|
||||
```
|
||||
|
||||
## Safety Rails
|
||||
|
||||
- **One transform per PR.** Never mix a rename with a format change.
|
||||
- **Keep a `done.list`** of files already migrated so the next batch skips them.
|
||||
- **Run the full test suite on the last batch**, even if per-batch PRs ran `--changed`.
|
||||
- **Codemod first, hand-edit second.** If the codemod misses 3 files, patch them manually and note it in the PR body.
|
||||
- **Roll back per-batch**, not globally. Each PR should revert cleanly.
|
||||
|
||||
## Verification Loop
|
||||
|
||||
After each merge:
|
||||
|
||||
```bash
|
||||
rg 'jest\.(mock|fn|spyOn)' | wc -l # should trend to 0
|
||||
npm test # full suite
|
||||
composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \
|
||||
-d '{"owner":"acme","repo":"app","branch":"main","event":"push"}' \
|
||||
| jq '.workflow_runs[0].conclusion'
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Codemod regex catches too much** → switch to AST-based tooling (`ast-grep`, `ts-morph`) for structural matches.
|
||||
- **Tests pass locally, CI fails** → pin Node/Python version parity; check `.nvmrc` / `pyproject.toml`.
|
||||
- **PR too big to review** → cut batch size in half; maintainers won't review 800-line diffs.
|
||||
- **Conflicts between batches** → rebase the open batch before merging the current one; never force-push merged batches.
|
||||
|
||||
Full CLI reference: [docs.composio.dev/docs/cli](https://docs.composio.dev/docs/cli)
|
||||
|
|
@ -0,0 +1,266 @@
|
|||
---
|
||||
name: codebase-recon
|
||||
description: This skill should be used when analyzing codebases, understanding architecture, or when "analyze", "investigate", "explore code", or "understand architecture" are mentioned.
|
||||
metadata:
|
||||
version: "1.0.0"
|
||||
---
|
||||
|
||||
# Codebase Analysis
|
||||
|
||||
Evidence-based investigation → findings → confidence-tracked conclusions.
|
||||
|
||||
## Steps
|
||||
|
||||
1. Gather evidence from multiple sources (code, docs, tests, history)
|
||||
2. Track confidence level as investigation progresses
|
||||
3. Based on findings:
|
||||
- If pattern analysis needed → load the `outfitter:patterns` skill
|
||||
- If root cause investigation → load the `outfitter:find-root-causes` skill
|
||||
- If ready to report → load the `outfitter:report-findings` skill
|
||||
4. Deliver findings with confidence level and caveats
|
||||
|
||||
<when_to_use>
|
||||
|
||||
- Codebase exploration and understanding
|
||||
- Architecture analysis and mapping
|
||||
- Pattern extraction and recognition
|
||||
- Technical research within code
|
||||
- Performance or security analysis
|
||||
|
||||
NOT for: wild guessing, assumptions without evidence, conclusions before investigation
|
||||
|
||||
</when_to_use>
|
||||
|
||||
<confidence>
|
||||
|
||||
| Bar | Lvl | Name | Action |
|
||||
|-----|-----|------|--------|
|
||||
| `░░░░░` | 0 | Gathering | Collect initial evidence |
|
||||
| `▓░░░░` | 1 | Surveying | Broad scan, surface patterns |
|
||||
| `▓▓░░░` | 2 | Investigating | Deep dive, verify patterns |
|
||||
| `▓▓▓░░` | 3 | Analyzing | Cross-reference, fill gaps |
|
||||
| `▓▓▓▓░` | 4 | Synthesizing | Connect findings, high confidence |
|
||||
| `▓▓▓▓▓` | 5 | Concluded | Deliver findings |
|
||||
|
||||
*Calibration: 0=0–19%, 1=20–39%, 2=40–59%, 3=60–74%, 4=75–89%, 5=90–100%*
|
||||
|
||||
Start honest. Clear codebase + focused question → level 2–3. Vague or complex → level 0–1.
|
||||
|
||||
At level 4: "High confidence in findings. One more angle would reach full certainty. Continue or deliver now?"
|
||||
|
||||
Below level 5: include `△ Caveats` section.
|
||||
|
||||
</confidence>
|
||||
|
||||
<principles>
|
||||
|
||||
## Core Methodology
|
||||
|
||||
**Evidence over assumption** — investigate when you can, guess only when you must.
|
||||
|
||||
**Multi-source gathering** — code, docs, tests, history, web research, runtime behavior.
|
||||
|
||||
**Multiple angles** — examine from different perspectives before concluding.
|
||||
|
||||
**Document gaps** — flag uncertainty with △, track what's unknown.
|
||||
|
||||
**Show your work** — findings include supporting evidence, not just conclusions.
|
||||
|
||||
**Calibrate confidence** — distinguish fact from inference from assumption.
|
||||
|
||||
</principles>
|
||||
|
||||
<evidence_gathering>
|
||||
|
||||
## Source Priority
|
||||
|
||||
1. **Direct observation** — read code, run searches, examine files
|
||||
2. **Documentation** — official docs, inline comments, ADRs
|
||||
3. **Tests** — reveal intended behavior and edge cases
|
||||
4. **History** — git log, commit messages, PR discussions
|
||||
5. **External research** — library docs, Stack Overflow, RFCs
|
||||
6. **Inference** — logical deduction from available evidence
|
||||
7. **Assumption** — clearly flagged when other sources unavailable
|
||||
|
||||
## Investigation Patterns
|
||||
|
||||
**Start broad, then narrow:**
|
||||
- File tree → identify relevant areas
|
||||
- Search patterns → locate specific code
|
||||
- Code structure → understand without full content
|
||||
- Read targeted files → examine implementation
|
||||
- Cross-reference → verify understanding
|
||||
|
||||
**Layer evidence:**
|
||||
- What does the code do? (direct observation)
|
||||
- Why was it written this way? (history, comments)
|
||||
- How does it fit the system? (architecture, dependencies)
|
||||
- What are the edge cases? (tests, error handling)
|
||||
|
||||
**Follow the trail:**
|
||||
- Function calls → trace execution paths
|
||||
- Imports/exports → map dependencies
|
||||
- Test files → understand usage patterns
|
||||
- Error messages → reveal assumptions
|
||||
- Comments → capture historical context
|
||||
|
||||
</evidence_gathering>
|
||||
|
||||
<output_format>
|
||||
|
||||
## During Investigation
|
||||
|
||||
After each evidence-gathering step emit:
|
||||
|
||||
- **Confidence:** {BAR} {NAME}
|
||||
- **Found:** { key discoveries }
|
||||
- **Patterns:** { emerging themes }
|
||||
- **Gaps:** { what's still unclear }
|
||||
- **Next:** { investigation direction }
|
||||
|
||||
## At Delivery (Level 5)
|
||||
|
||||
### Findings
|
||||
|
||||
{ numbered list of discoveries with supporting evidence }
|
||||
|
||||
1. {FINDING} — evidence: {SOURCE}
|
||||
2. {FINDING} — evidence: {SOURCE}
|
||||
|
||||
### Patterns
|
||||
|
||||
{ recurring themes or structures identified }
|
||||
|
||||
### Implications
|
||||
|
||||
{ what findings mean for the question at hand }
|
||||
|
||||
### Confidence Assessment
|
||||
|
||||
Overall: {BAR} {PERCENTAGE}%
|
||||
|
||||
High confidence areas:
|
||||
- {AREA} — {REASON}
|
||||
|
||||
Lower confidence areas:
|
||||
- {AREA} — {REASON}
|
||||
|
||||
### Supporting Evidence
|
||||
|
||||
- Code: { file paths and line ranges }
|
||||
- Docs: { references }
|
||||
- Tests: { relevant test files }
|
||||
- History: { commit SHAs if relevant }
|
||||
- External: { URLs if applicable }
|
||||
|
||||
## Below Level 5
|
||||
|
||||
### △ Caveats
|
||||
|
||||
**Assumptions:**
|
||||
- {ASSUMPTION} — { why necessary, impact if wrong }
|
||||
|
||||
**Gaps:**
|
||||
- {GAP} — { what's missing, how to fill }
|
||||
|
||||
**Unknowns:**
|
||||
- {UNKNOWN} — { noted for future investigation }
|
||||
|
||||
</output_format>
|
||||
|
||||
<specialized_techniques>
|
||||
|
||||
Load skills for specialized analysis (see Steps section):
|
||||
|
||||
- **Pattern analysis** → `outfitter:patterns`
|
||||
- **Root cause investigation** → `outfitter:find-root-causes`
|
||||
- **Research synthesis** → `outfitter:report-findings`
|
||||
- **Architecture analysis** → see [architecture-analysis.md](references/architecture-analysis.md)
|
||||
|
||||
</specialized_techniques>
|
||||
|
||||
<workflow>
|
||||
|
||||
Loop: Gather → Analyze → Update Confidence → Next step
|
||||
|
||||
1. **Calibrate starting confidence** — what do we already know?
|
||||
2. **Identify evidence sources** — where can we look?
|
||||
3. **Gather systematically** — collect from multiple angles
|
||||
4. **Cross-reference findings** — verify patterns hold
|
||||
5. **Flag uncertainties** — mark gaps with △
|
||||
6. **Synthesize conclusions** — connect evidence to insights
|
||||
7. **Deliver with confidence level** — clear about certainty
|
||||
|
||||
At each step:
|
||||
- Document what you found (evidence)
|
||||
- Note what it means (interpretation)
|
||||
- Track what's still unclear (gaps)
|
||||
- Update confidence bar
|
||||
|
||||
</workflow>
|
||||
|
||||
<validation>
|
||||
|
||||
Before concluding (level 4+):
|
||||
|
||||
**Check evidence quality:**
|
||||
- ✓ Multiple sources confirm pattern?
|
||||
- ✓ Direct observation vs inference clearly marked?
|
||||
- ✓ Assumptions explicitly flagged?
|
||||
- ✓ Counter-examples considered?
|
||||
|
||||
**Check completeness:**
|
||||
- ✓ Original question fully addressed?
|
||||
- ✓ Edge cases explored?
|
||||
- ✓ Alternative explanations ruled out?
|
||||
- ✓ Known unknowns documented?
|
||||
|
||||
**Check deliverable:**
|
||||
- ✓ Findings supported by evidence?
|
||||
- ✓ Confidence calibrated honestly?
|
||||
- ✓ Caveats section included if <100%?
|
||||
- ✓ Next steps clear if incomplete?
|
||||
|
||||
</validation>
|
||||
|
||||
<rules>
|
||||
|
||||
ALWAYS:
|
||||
- Investigate before concluding
|
||||
- Cite evidence sources with file paths/URLs
|
||||
- Use confidence bars to track certainty
|
||||
- Flag assumptions and gaps with △
|
||||
- Cross-reference from multiple angles
|
||||
- Document investigation trail
|
||||
- Distinguish fact from inference
|
||||
- Include caveats below level 5
|
||||
|
||||
NEVER:
|
||||
- Guess when you can investigate
|
||||
- State assumptions as facts
|
||||
- Conclude from single source
|
||||
- Hide uncertainty or gaps
|
||||
- Skip validation checks
|
||||
- Deliver without confidence assessment
|
||||
- Conflate evidence with interpretation
|
||||
|
||||
</rules>
|
||||
|
||||
<references>
|
||||
|
||||
Core methodology:
|
||||
- [confidence.md](../pathfinding/references/confidence.md) — confidence calibration (shared with pathfinding)
|
||||
|
||||
Micro-skills (load as needed):
|
||||
- `outfitter:patterns` — extracting and validating patterns
|
||||
- `outfitter:find-root-causes` — systematic problem diagnosis
|
||||
- `outfitter:report-findings` — multi-source research synthesis
|
||||
|
||||
Local references:
|
||||
- [architecture-analysis.md](references/architecture-analysis.md) — system structure mapping
|
||||
|
||||
Related skills:
|
||||
- `outfitter:pathfinding` — clarifying requirements before analysis
|
||||
- `outfitter:debugging` — structured bug investigation
|
||||
|
||||
</references>
|
||||
|
|
@ -0,0 +1,220 @@
|
|||
# Architecture Analysis
|
||||
|
||||
Techniques for analyzing system structure, dependencies, and component relationships.
|
||||
|
||||
## Dependency Mapping
|
||||
|
||||
### Forward Dependencies
|
||||
|
||||
What a component relies on:
|
||||
1. **Direct imports** — explicit dependencies in code
|
||||
2. **Indirect references** — called through interfaces
|
||||
3. **Runtime dependencies** — configuration, environment
|
||||
4. **Data dependencies** — shared state, databases
|
||||
|
||||
### Reverse Dependencies
|
||||
|
||||
What relies on this component:
|
||||
1. **Direct dependents** — explicit imports from other modules
|
||||
2. **Interface consumers** — components using this API
|
||||
3. **Side effect consumers** — code relying on mutations
|
||||
4. **Event subscribers** — listeners for this component's events
|
||||
|
||||
### Circular Dependencies
|
||||
|
||||
Red flags:
|
||||
- A imports B, B imports A
|
||||
- Longer cycles: A → B → C → A
|
||||
- Implicit cycles through shared state
|
||||
|
||||
Resolution strategies:
|
||||
- Extract shared code to separate module
|
||||
- Introduce interface/abstraction layer
|
||||
- Invert dependency direction
|
||||
- Break into smaller components
|
||||
|
||||
## Layer Identification
|
||||
|
||||
### Detecting Layers
|
||||
|
||||
Look for:
|
||||
- **Directional flow** — data/control flows one way
|
||||
- **Abstraction levels** — concrete → abstract as you ascend
|
||||
- **Responsibility clustering** — similar concerns grouped
|
||||
- **Interface boundaries** — clear contracts between groups
|
||||
|
||||
### Common Layer Patterns
|
||||
|
||||
**Three-tier**:
|
||||
- Presentation (UI, API endpoints)
|
||||
- Business logic (domain, workflows)
|
||||
- Data access (repositories, queries)
|
||||
|
||||
**Hexagonal/Clean**:
|
||||
- Core domain (entities, business rules)
|
||||
- Application layer (use cases, orchestration)
|
||||
- Infrastructure (frameworks, external services)
|
||||
- Interfaces (controllers, adapters)
|
||||
|
||||
**Microservices**:
|
||||
- Service boundary (API gateway)
|
||||
- Service logic (domain per service)
|
||||
- Data layer (per-service database)
|
||||
- Cross-cutting (auth, logging, monitoring)
|
||||
|
||||
### Layer Violations
|
||||
|
||||
Violations indicate architectural drift:
|
||||
- Lower layer imports higher layer
|
||||
- Business logic in presentation layer
|
||||
- Data access code in domain entities
|
||||
- Infrastructure concerns leaking into core
|
||||
|
||||
## Interface Analysis
|
||||
|
||||
### Contract Definition
|
||||
|
||||
Examine:
|
||||
- **Input types** — what does it accept?
|
||||
- **Output types** — what does it return?
|
||||
- **Error modes** — what can fail, how?
|
||||
- **Side effects** — mutations, I/O, state changes
|
||||
- **Invariants** — what must always be true?
|
||||
|
||||
### API Quality
|
||||
|
||||
Strong interfaces show:
|
||||
- **Cohesion** — methods belong together
|
||||
- **Minimal surface** — small, focused API
|
||||
- **Clear contracts** — types tell the story
|
||||
- **Stability** — changes don't cascade
|
||||
- **Composability** — works well with others
|
||||
|
||||
Weak interfaces show:
|
||||
- **Kitchen sink** — unrelated methods bundled
|
||||
- **Leaky abstractions** — implementation details exposed
|
||||
- **Unstable** — frequent breaking changes
|
||||
- **Rigid** — hard to extend or compose
|
||||
|
||||
## Component Relationships
|
||||
|
||||
### Relationship Types
|
||||
|
||||
**Composition**:
|
||||
- Component owns sub-components
|
||||
- Lifecycles coupled
|
||||
- Strong cohesion
|
||||
- Example: `Page` owns `Header`, `Footer`
|
||||
|
||||
**Aggregation**:
|
||||
- Component references others
|
||||
- Independent lifecycles
|
||||
- Loose coupling
|
||||
- Example: `ShoppingCart` references `Product`
|
||||
|
||||
**Dependency**:
|
||||
- Uses another component's interface
|
||||
- No ownership
|
||||
- Can be swapped
|
||||
- Example: `AuthService` uses `Database`
|
||||
|
||||
**Association**:
|
||||
- Knows about but doesn't own
|
||||
- Weak relationship
|
||||
- Often bidirectional
|
||||
- Example: `User` ↔ `Post` (many-to-many)
|
||||
|
||||
### Coupling Analysis
|
||||
|
||||
**Low coupling** (good):
|
||||
- Communicate through interfaces
|
||||
- Few shared assumptions
|
||||
- Changes localized
|
||||
- Easy to test in isolation
|
||||
|
||||
**High coupling** (risky):
|
||||
- Direct field access
|
||||
- Shared mutable state
|
||||
- Knowledge of implementation
|
||||
- Changes ripple widely
|
||||
|
||||
## Architectural Pattern Recognition
|
||||
|
||||
### Layered Architecture
|
||||
|
||||
Indicators:
|
||||
- Unidirectional dependencies (top → bottom)
|
||||
- Each layer uses only layer below
|
||||
- Clear separation of concerns
|
||||
|
||||
Trade-offs:
|
||||
- ✓ Simple, well-understood
|
||||
- ✓ Easy to enforce rules
|
||||
- ✗ Can become rigid
|
||||
- ✗ Performance overhead
|
||||
|
||||
### Event-Driven Architecture
|
||||
|
||||
Indicators:
|
||||
- Pub/sub or message queues
|
||||
- Decoupled components
|
||||
- Asynchronous communication
|
||||
- Event sourcing patterns
|
||||
|
||||
Trade-offs:
|
||||
- ✓ Scalable, resilient
|
||||
- ✓ Loose coupling
|
||||
- ✗ Harder to reason about flow
|
||||
- ✗ Eventual consistency challenges
|
||||
|
||||
### Microservices
|
||||
|
||||
Indicators:
|
||||
- Service per bounded context
|
||||
- Independent deployment
|
||||
- API-based communication
|
||||
- Decentralized data
|
||||
|
||||
Trade-offs:
|
||||
- ✓ Independent scaling
|
||||
- ✓ Technology diversity
|
||||
- ✗ Distributed system complexity
|
||||
- ✗ Operational overhead
|
||||
|
||||
## Analysis Workflow
|
||||
|
||||
### Top-Down
|
||||
|
||||
Start broad, narrow focus:
|
||||
1. **System boundaries** — what's in scope?
|
||||
2. **Major components** — high-level modules
|
||||
3. **Component interactions** — how they communicate
|
||||
4. **Internal structure** — zoom into each component
|
||||
5. **Implementation** — code-level details
|
||||
|
||||
### Bottom-Up
|
||||
|
||||
Start specific, build understanding:
|
||||
1. **Entry point** — main(), server start, UI root
|
||||
2. **Call graph** — trace execution paths
|
||||
3. **Cluster calls** — group related functionality
|
||||
4. **Extract components** — identify logical boundaries
|
||||
5. **Map relationships** — connect the pieces
|
||||
|
||||
### Targeted
|
||||
|
||||
Focus on specific concern:
|
||||
1. **Define question** — what are you trying to understand?
|
||||
2. **Identify relevant code** — where does this happen?
|
||||
3. **Trace dependencies** — what does it touch?
|
||||
4. **Analyze impact** — what would changing this affect?
|
||||
5. **Document findings** — capture insights
|
||||
|
||||
## Documentation Extraction
|
||||
|
||||
From architecture analysis, capture:
|
||||
- **Component diagram** — boxes and arrows
|
||||
- **Dependency graph** — what imports what
|
||||
- **Layer diagram** — abstraction levels
|
||||
- **Sequence diagrams** — interaction flows
|
||||
- **Decision records** — why this structure?
|
||||
Loading…
Reference in New Issue