From b4c88b32be47afe4651ae53b3f502117b3a02fdd Mon Sep 17 00:00:00 2001 From: "ci[bot]" Date: Fri, 29 May 2026 08:33:55 +0000 Subject: [PATCH] :package: deps(skills): sync thirdparty skills --- skills/thirdparty/.sources/brooks-lint.list | 7 + .../thirdparty/.sources/codebase-migrate.list | 1 + .../thirdparty/.sources/codebase-recon.list | 1 + skills/thirdparty/_shared/common.md | 240 ++++++++++++++ .../thirdparty/_shared/custom-risks-guide.md | 48 +++ skills/thirdparty/_shared/decay-risks.md | 294 ++++++++++++++++++ skills/thirdparty/_shared/remedy-guide.md | 37 +++ skills/thirdparty/_shared/source-coverage.md | 248 +++++++++++++++ skills/thirdparty/_shared/test-decay-risks.md | 246 +++++++++++++++ skills/thirdparty/brooks-audit/SKILL.md | 42 +++ .../brooks-audit/architecture-guide.md | 195 ++++++++++++ .../brooks-audit/onboarding-guide.md | 89 ++++++ skills/thirdparty/brooks-debt/SKILL.md | 35 +++ skills/thirdparty/brooks-debt/debt-guide.md | 125 ++++++++ skills/thirdparty/brooks-health/SKILL.md | 37 +++ .../thirdparty/brooks-health/health-guide.md | 89 ++++++ skills/thirdparty/brooks-review/SKILL.md | 37 +++ .../brooks-review/pr-review-guide.md | 163 ++++++++++ skills/thirdparty/brooks-sweep/SKILL.md | 38 +++ skills/thirdparty/brooks-sweep/sweep-guide.md | 264 ++++++++++++++++ skills/thirdparty/brooks-test/SKILL.md | 36 +++ skills/thirdparty/brooks-test/test-guide.md | 147 +++++++++ skills/thirdparty/codebase-migrate/SKILL.md | 128 ++++++++ skills/thirdparty/codebase-recon/SKILL.md | 266 ++++++++++++++++ .../references/architecture-analysis.md | 220 +++++++++++++ 25 files changed, 3033 insertions(+) create mode 100644 skills/thirdparty/.sources/brooks-lint.list create mode 100644 skills/thirdparty/.sources/codebase-migrate.list create mode 100644 skills/thirdparty/.sources/codebase-recon.list create mode 100644 skills/thirdparty/_shared/common.md create mode 100644 skills/thirdparty/_shared/custom-risks-guide.md create mode 100644 skills/thirdparty/_shared/decay-risks.md create mode 100644 skills/thirdparty/_shared/remedy-guide.md create mode 100644 skills/thirdparty/_shared/source-coverage.md create mode 100644 skills/thirdparty/_shared/test-decay-risks.md create mode 100644 skills/thirdparty/brooks-audit/SKILL.md create mode 100644 skills/thirdparty/brooks-audit/architecture-guide.md create mode 100644 skills/thirdparty/brooks-audit/onboarding-guide.md create mode 100644 skills/thirdparty/brooks-debt/SKILL.md create mode 100644 skills/thirdparty/brooks-debt/debt-guide.md create mode 100644 skills/thirdparty/brooks-health/SKILL.md create mode 100644 skills/thirdparty/brooks-health/health-guide.md create mode 100644 skills/thirdparty/brooks-review/SKILL.md create mode 100644 skills/thirdparty/brooks-review/pr-review-guide.md create mode 100644 skills/thirdparty/brooks-sweep/SKILL.md create mode 100644 skills/thirdparty/brooks-sweep/sweep-guide.md create mode 100644 skills/thirdparty/brooks-test/SKILL.md create mode 100644 skills/thirdparty/brooks-test/test-guide.md create mode 100644 skills/thirdparty/codebase-migrate/SKILL.md create mode 100644 skills/thirdparty/codebase-recon/SKILL.md create mode 100644 skills/thirdparty/codebase-recon/references/architecture-analysis.md diff --git a/skills/thirdparty/.sources/brooks-lint.list b/skills/thirdparty/.sources/brooks-lint.list new file mode 100644 index 00000000..8c6ac6e2 --- /dev/null +++ b/skills/thirdparty/.sources/brooks-lint.list @@ -0,0 +1,7 @@ +_shared +brooks-audit +brooks-debt +brooks-health +brooks-review +brooks-sweep +brooks-test diff --git a/skills/thirdparty/.sources/codebase-migrate.list b/skills/thirdparty/.sources/codebase-migrate.list new file mode 100644 index 00000000..603789e3 --- /dev/null +++ b/skills/thirdparty/.sources/codebase-migrate.list @@ -0,0 +1 @@ +codebase-migrate diff --git a/skills/thirdparty/.sources/codebase-recon.list b/skills/thirdparty/.sources/codebase-recon.list new file mode 100644 index 00000000..efd4ed2b --- /dev/null +++ b/skills/thirdparty/.sources/codebase-recon.list @@ -0,0 +1 @@ +codebase-recon diff --git a/skills/thirdparty/_shared/common.md b/skills/thirdparty/_shared/common.md new file mode 100644 index 00000000..4c7256e3 --- /dev/null +++ b/skills/thirdparty/_shared/common.md @@ -0,0 +1,240 @@ +# Brooks-Lint — Shared Framework + +Code and test quality diagnosis using principles from twelve classic software engineering books. +Use `source-coverage.md` to keep those sources grounded in real evidence, exceptions, and tradeoffs. + +## The Iron Law + +``` +NEVER suggest fixes before completing risk diagnosis. +EVERY finding must follow: Symptom → Source → Consequence → Remedy. +``` + +Violating this law produces reviews that list rule violations without explaining why they +matter. A finding without a consequence and a remedy is not a finding — it is noise. + +> **On-demand sections (skip unless the condition applies):** +> - "Remedy Mode" — only when user passes `--fix` or asks to fix findings +> - "Post-Report Triage" — only in interactive sessions after the report is output +> - "History Tracking" — only after the Health Score is computed + +## Project Config + +Before executing the review, attempt to read `.brooks-lint.yaml` from the project root. +If the file exists, parse and apply its settings before proceeding. +If the file does not exist, continue with defaults (all risks enabled, no ignores). + +In a multi-mode session, re-read only if the user says the config has changed. + +### Supported settings + +**`disable`** — list of risk codes to skip entirely. Findings for disabled risks are +silently omitted from the report and do not affect the Health Score. +Valid codes: `R1` `R2` `R3` `R4` `R5` `R6` `T1` `T2` `T3` `T4` `T5` `T6` + +**`severity`** — override the severity of a specific risk for this project. +Valid values: `critical` `warning` `suggestion` +Example: `R1: suggestion` means every R1 finding is downgraded to Suggestion regardless +of what the guide says. + +**`ignore`** — list of glob patterns. Files matching any pattern are excluded from +analysis. Findings that arise solely from ignored files are omitted. +Common entries: `**/*.generated.*`, `**/vendor/**`, `**/migrations/**` + +**`focus`** — non-empty list of risk codes to evaluate; all others are skipped. +Omit this key (or leave it empty) to evaluate all non-disabled risks. +Cannot be combined with a non-empty `disable` list. + +**Minimal example:** +```yaml +version: 1 +disable: + - T5 +severity: + R1: suggestion +ignore: + - "**/*.generated.*" +``` + +If `.brooks-lint.yaml` contains a `custom_risks` map, read `custom-risks-guide.md` +from the `_shared/` directory for loading and scanning instructions. + +### Config Validation + +Before applying, check for errors and mention each in the report: +- Invalid risk code (not R1–R6, T1–T6, or a defined `Cx` code): skip it, note `"Config warning: X is not a valid risk code"` +- Invalid severity value (not `critical`/`warning`/`suggestion`): skip it, note the error +- Both `disable` and `focus` are non-empty: treat as a config error, ignore both, note it + +If the YAML fails to parse entirely, skip config loading and proceed with defaults. + +### Config Reporting + +If a config file was found and applied, add this line immediately after the **Scope** line +in the report: +`Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)` + +Include N and M even if zero. Omit this line if no config file was found. + +--- + +## Auto Scope Detection + +When no files or code are specified, detect scope automatically: + +**PR Review:** `git diff --cached` → `git diff` → `git diff main...HEAD` → ask user. + +**Architecture Audit / Tech Debt:** Entire project by default. `--since=`: run `git diff ...HEAD --name-only`, analyze only modules containing changed files; note "Incremental audit — modules touched since ". + +**Test Quality:** All test files by default. If a diff exists, prioritize test files co-located with changed production files (`src/foo.ts` → `src/foo.test.ts`). + +**Health Dashboard:** Entire project by default. If user provides a path, scope all dimension sub-scans to that path. + +**Scope line:** Always state what was detected — e.g., `Scope: staged changes (3 files)` or `Scope: branch changes vs main (12 files)`. + +--- + +## The Six Decay Risks + +Navigation index only — canonical definitions (symptoms, severity guides, sources, "What Not +to Flag" guards) live in `decay-risks.md`. Do not duplicate or edit diagnostic questions here; +update `decay-risks.md` directly. Book-level coverage, exceptions, and tradeoffs are in +`source-coverage.md`. + +| Risk | Diagnostic Question | +|------|---------------------| +| Cognitive Overload | How much mental effort to understand this? | +| Change Propagation | How many unrelated things break on one change? | +| Knowledge Duplication | Is the same decision expressed in multiple places? | +| Accidental Complexity | Is the code more complex than the problem? | +| Dependency Disorder | Do dependencies flow in a consistent direction? | +| Domain Model Distortion | Does the code faithfully represent the domain? | + +--- + +## Report Template + +**Language rule:** Output the report in the same language the user is using. Translate the +per-finding content and the one-sentence verdict to match the user's language. Keep the +following in English: Iron Law field labels (Symptom / Source / Consequence / Remedy), +book titles, principle and smell names (e.g. "Shotgun Surgery", "Divergent Change"), +and fixed structural headers from the template below (`Findings`, `Summary`, +`Module Dependency Graph`, `Critical`, `Warning`, `Suggestion`). + +```` +# Brooks-Lint Review + +**Mode:** [PR Review / Architecture Audit / Tech Debt Assessment / Test Quality Review] +**Scope:** [file(s), directory, or description of what was reviewed] +**Health Score:** XX/100 + +[One sentence overall verdict] + +--- + +## Module Dependency Graph + + + + +```mermaid +graph TD + ... +``` + +--- + +## Findings + + + + +### 🔴 Critical + +**[Risk Name] — [Short descriptive title]** +Symptom: [exactly what was observed in the code] +Source: [Book title — Principle or Smell name] +Consequence: [what breaks or gets worse if this is not fixed] +Remedy: [concrete, specific action] + +### 🟡 Warning + +**[Risk Name] — [Short descriptive title]** +Symptom: ... +Source: ... +Consequence: ... +Remedy: ... + +### 🟢 Suggestion + +**[Risk Name] — [Short descriptive title]** +Symptom: ... +Source: ... +Consequence: ... +Remedy: ... + +--- + +## Summary + +[2–3 sentences: what is the most important action, and what is the overall trend] +```` + +## Remedy Mode + +When the user passes `--fix` or asks to "fix the findings", read +`remedy-guide.md` from the `_shared/` directory before writing the report. + +## Health Score Calculation + +Base score: 100 +Deductions: +- Each 🔴 Critical finding: −15 +- Each 🟡 Warning finding: −5 +- Each 🟢 Suggestion finding: −1 +Floor: 0 (score cannot go below 0) + +## History Tracking + +After generating the Health Score, attempt to append a record to `.brooks-lint-history.json` +in the project root. + +**Append logic:** +1. Read the file (or start with empty array if it doesn't exist) +2. Append: `{ date, mode, score, findings: { critical, warning, suggestion }, scope }` +3. Write the file back + +**Trend display:** If the history file exists and contains at least one prior record for +the same mode, add a Trend line after the Health Score in the report: + + **Trend:** 85 → 82 (−3) over last 3 runs + +Show the most recent prior score and the delta. If delta is 0: "Stable at 82". +If this is the first run for this mode: "First run — no trend data". + +## Post-Report Triage (Optional) + +**Guard:** Interactive sessions only — skip in CI/headless mode. + +After reporting Warning or Suggestion findings, offer: +> Would you like to triage these findings? (accept / dismiss / defer / skip) + +For each finding one at a time (lowest severity first): show title, ask `[a]ccept / [d]ismiss / [f]defer / [s]kip`; wait for reply before moving to the next. + +**Dismiss:** ask one-line reason → append to `.brooks-lint.yaml` under `suppress:` → downgraded to info in future runs. + +**Defer:** same as dismiss, add `expires: YYYY-MM-DD` (default 90 days) → resurfaces at original severity after expiry. + +**Suppress matching at scan time:** for each `suppress:` entry, match `risk` code and file `pattern` against findings. +- Both match → downgrade to info (not counted in Health Score, shown under collapsed "Suppressed" section). +- `expires` is past → ignore entry, finding resurfaces. Note in Summary: "N suppressed findings have expired and are now active again." + +## Reference Files + +Read on demand: + +| File | When to Read | +|------|-------------| +| `source-coverage.md` | At the start of every review, before writing findings | +| `decay-risks.md` | Before any production-code review or architecture/debt assessment | +| `test-decay-risks.md` | Before any test review and before the PR Review "Quick Test Check" step | diff --git a/skills/thirdparty/_shared/custom-risks-guide.md b/skills/thirdparty/_shared/custom-risks-guide.md new file mode 100644 index 00000000..6121c2b5 --- /dev/null +++ b/skills/thirdparty/_shared/custom-risks-guide.md @@ -0,0 +1,48 @@ +# Custom Risk Loading Guide + +When `.brooks-lint.yaml` contains a `custom_risks` map, this guide governs how those +risks are loaded and scanned. Custom risks use `Cx` codes (C1, C2, …) — no conflict with +the standard R1–R6 and T1–T6 namespaces. + +--- + +## Loading + +1. For each entry in `custom_risks`, validate that it has: + - `name` — non-empty string + - `question` — the diagnostic question to ask + - `symptoms` — non-empty list of symptom patterns + - `severity` — map with at least one of: `critical`, `warning`, `suggestion` + +2. Register each valid entry as a `Cx` code alongside R1–R6 / T1–T6. Once loaded, + `Cx` codes become valid targets for `disable`, `focus`, and `severity` fields in + the same config file. + +3. Report any validation errors as config warnings (do not abort the review): + - Missing required field: `"Config warning: C1 missing 'symptoms'"` + - Invalid code format (must be `C` followed by digits): skip, note error + - Code conflicts with R/T namespace: skip, note error + +--- + +## Scanning + +During the analysis, treat each custom risk as an additional step after the standard +process: + +- Use `question` as the diagnostic question +- Use `symptoms` as the symptom lookup list +- Use the `severity` map for tier classification +- Apply the Iron Law: `Source` field should be `"[Project-defined risk] — "` +- Include custom risk findings in the Health Score (same deduction rules as R/T codes) +- In the report, custom findings appear after standard findings under a + **### Project-Specific Risks** sub-heading + +--- + +## Config Validation additions + +The following codes are valid in `disable`, `focus`, and `severity`: +- Standard: `R1`–`R6`, `T1`–`T6` +- Custom: any `Cx` code defined in `custom_risks` +- Any other code: skip it and emit `"Config warning: X is not a valid risk code"` diff --git a/skills/thirdparty/_shared/decay-risks.md b/skills/thirdparty/_shared/decay-risks.md new file mode 100644 index 00000000..21b4d011 --- /dev/null +++ b/skills/thirdparty/_shared/decay-risks.md @@ -0,0 +1,294 @@ +# Decay Risk Reference + +Six patterns that cause software to degrade. Apply the Iron Law to each finding. + +--- + +## Risk 1: Cognitive Overload + +**Diagnostic question:** How much mental effort does a human need to understand this? + +Cognitive load beyond working memory causes mistakes, avoidance, and blocks the refactoring that would fix it. + +### Symptoms + +- Function longer than 20 lines where multiple levels of abstraction are mixed together +- Nesting depth greater than 3 levels +- Parameter list with more than 4 parameters +- Magic numbers or unexplained constants +- Variable names that require reading the implementation to understand (e.g., `d`, `tmp2`, `flag`) +- Boolean expressions with 3 or more conditions combined +- Train-wreck chains: `a.getB().getC().doD()` +- Code names that do not match what the business calls the same concept +- Flag Arguments: a boolean parameter that makes the function do two fundamentally different + things depending on its value — a sign the function has two responsibilities +- Primitive Obsession: domain concepts represented as primitive types (`String email`, + `int orderId`, `double money`) rather than purpose-built value types — forces callers to know + which string is an email and which is a name +- Shallow module: the interface or documentation of a component is more complex relative to + the functionality it provides + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Long Method | Fowler — Refactoring | Long Method | +| Long Parameter List | Fowler — Refactoring | Long Parameter List | +| Message Chains | Fowler — Refactoring | Message Chains | +| Flag Arguments | Fowler — Refactoring | Flag Arguments | +| Primitive Obsession | Fowler — Refactoring | Primitive Obsession | +| Function length and nesting | McConnell — Code Complete | Ch. 7: High-Quality Routines | +| Variable naming | McConnell — Code Complete | Ch. 11: The Power of Variable Names | +| Magic numbers | McConnell — Code Complete | Ch. 12: Fundamental Data Types | +| Domain name mismatch | Evans — Domain-Driven Design | Ubiquitous Language | +| Shallow Module | Ousterhout — A Philosophy of Software Design | Ch. 4: Modules Should Be Deep | + +### Severity Guide + +- 🔴 Critical: function > 50 lines, nesting > 5, or virtually no meaningful names +- 🟡 Warning: function 20–50 lines, nesting 4–5, some unclear names +- 🟢 Suggestion: minor naming issues, 1–2 magic numbers, isolated train-wreck chains + +### What Not to Flag + +- Linear code with clear names and guard clauses is not automatically high cognitive load +- Internal implementation detail hidden behind a deep, simple module boundary is not a shallow-module problem +- Domain-specific terminology should not be flagged if it matches how experts actually speak + +--- + +## Risk 2: Change Propagation + +**Diagnostic question:** How many unrelated things break when you change one thing? + +Each change ripples to unrelated modules, slowing velocity and multiplying regression risk. + +### Symptoms + +- Modifying one feature requires touching more than 3 files in unrelated modules +- One class changes for multiple different business reasons (e.g., `UserService` changes for + billing logic AND notification logic AND profile logic) +- A method uses more data from another class than from its own class +- Two classes know each other's internal state directly +- Changing one module requires recompiling or retesting many unrelated modules +- **Hyrum's Law**: with sufficient callers, every observable behavior — including + implementation details, error message text, coincidental call ordering, and undocumented + side effects — becomes an implicit contract that callers depend on, even though it was + never guaranteed by the declared API +- **Orthogonality violation**: changing one dimension of a feature forces edits in + unrelated dimensions — adding a new payment type should not require touching logging, + caching, or notification code, but in a non-orthogonal design it does +- Information Leakage: a design decision (e.g., a file format, protocol detail, or data + shape) is encoded in more than one module, so changing it requires coordinated edits + in multiple places even though only one module "owns" the concept + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Shotgun Surgery | Fowler — Refactoring | Shotgun Surgery | +| Divergent Change | Fowler — Refactoring | Divergent Change | +| Feature Envy | Fowler — Refactoring | Feature Envy | +| Inappropriate Intimacy | Fowler — Refactoring | Inappropriate Intimacy | +| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality | +| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle | +| High change propagation radius | Brooks — The Mythical Man-Month | Ch. 2: Brooks's Law (communication overhead) | +| Hyrum's Law | Winters et al. — Software Engineering at Google | Ch. 1: Hyrum's Law | +| Information Leakage | Ousterhout — A Philosophy of Software Design | Ch. 5: Information Hiding and Leakage | + +### Severity Guide + +- 🔴 Critical: one change touches > 5 files, or there is a structural dependency inversion (domain depends on infrastructure) +- 🟡 Warning: one change touches 3–5 files, mild coupling between modules +- 🟢 Suggestion: minor coupling, easily isolatable + +### What Not to Flag + +- A composition root wiring concrete dependencies is not a DIP violation by itself +- A stable public API with intentionally supported behavior is not automatically Hyrum's Law debt +- Similar edits inside one bounded context may be normal coordinated change, not shotgun surgery + +--- + +## Risk 3: Knowledge Duplication + +**Diagnostic question:** Is the same decision expressed in more than one place? + +Multiple copies drift apart silently. DRY is about decisions, not code lines. + +### Symptoms + +- Same logic copy-pasted across multiple files or functions +- Same concept named differently in different parts of the codebase + (e.g., `user`, `account`, `member`, `customer` all referring to the same domain entity) +- Parallel class hierarchies that must change in sync + (e.g., adding a new payment type requires adding a class in 3 different hierarchies) +- Configuration values repeated as literals in multiple places +- Two modules that implement the same algorithm independently + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Code duplication | Fowler — Refactoring | Duplicate Code | +| Parallel Inheritance | Fowler — Refactoring | Parallel Inheritance Hierarchies | +| DRY violation | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself | +| Inconsistent naming | Evans — Domain-Driven Design | Ubiquitous Language | +| Alternative Classes | Fowler — Refactoring | Alternative Classes with Different Interfaces | + +### Severity Guide + +- 🔴 Critical: core business logic duplicated across modules, or same domain concept named 3+ different ways +- 🟡 Warning: utility code duplicated, naming inconsistent within a subsystem +- 🟢 Suggestion: minor literal duplication, single naming inconsistency + +### What Not to Flag + +- Repetition across separate bounded contexts is not automatically duplicate knowledge +- Temporary duplication during an active extraction or migration is not necessarily debt +- Shared protocol constants repeated at explicit boundaries may be acceptable when local ownership is clearer + +--- + +## Risk 4: Accidental Complexity + +**Diagnostic question:** Is the code more complex than the problem it solves? + +Accidental complexity accumulates addition by addition until developers fight scaffolding more than solving the problem. + +### Symptoms + +- Abstractions built "for future use" with no current consumer + (e.g., a plugin system for a use case that has only one known implementation) +- Classes that barely justify their existence (wrap a single method call) +- Classes that only delegate to another class without adding behavior (pure middle-men) +- Second attempt at a system that is significantly more elaborate than the first, + adding generality for requirements that do not yet exist +- Switch statements that signal missing polymorphism +- Configuration options that have never been changed from their defaults +- Framework code larger than the application it powers +- Code grown under sustained tactical shortcuts: each workaround seemed small, but + accumulated shortcuts mean every new feature requires fighting the existing structure + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Speculative Generality | Fowler — Refactoring | Speculative Generality | +| Lazy Class | Fowler — Refactoring | Lazy Class | +| Middle Man | Fowler — Refactoring | Middle Man | +| Switch Statements | Fowler — Refactoring | Switch Statements | +| Second System Effect | Brooks — The Mythical Man-Month | Ch. 5: The Second-System Effect | +| YAGNI violations | McConnell — Code Complete | Ch. 5: Design in Construction | +| Over-engineering | Hunt & Thomas — The Pragmatic Programmer | Topic 4: Good-Enough Software | +| Tactical programming debt | Ousterhout — A Philosophy of Software Design | Ch. 3: Strategic vs. Tactical Programming | + +### Severity Guide + +- 🔴 Critical: an entire subsystem built around a speculative requirement, or framework overhead dominates domain logic +- 🟡 Warning: several unnecessary abstractions or wrapper classes, unused configuration systems +- 🟢 Suggestion: one or two lazy classes or middle-man patterns in non-critical paths + +### What Not to Flag + +- A switch over an external protocol, wire format, or closed enum is not automatically missing polymorphism +- Thin wrappers that absorb vendor churn or hide instability may be justified +- A larger second version is not second-system effect unless the added generality exceeds present needs + +--- + +## Risk 5: Dependency Disorder + +**Diagnostic question:** Do dependencies flow in a consistent, predictable direction? + +When business logic depends on infrastructure, infrastructure changes cascade into domain changes. Cycles prevent isolation. + +### Symptoms + +- Circular dependencies between modules or packages +- High-level business logic directly imports from low-level infrastructure + (e.g., a domain service imports from a specific database driver) +- Stable, widely-used components depend on unstable, frequently-changing ones +- Abstract components depending on concrete implementations +- Law of Demeter violations: `order.getCustomer().getAddress().getCity()` +- Module fan-out greater than 5 (imports from more than 5 other modules) +- A module implements an interface but only uses a subset of its methods, or must + provide stub implementations for methods it does not need (ISP violation: fat interface + forces unwanted dependencies on callers) +- The system feels like "one mind did not design this" — different modules use + incompatible architectural patterns with no clear rule for which to use where +- Direct version-pinned dependencies on transitive packages (diamond dependency risk); + upgrading one library requires coordinating multiple unrelated teams or repositories + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Dependency cycles | Martin — Clean Architecture | Acyclic Dependencies Principle (ADP) | +| DIP violation | Martin — Clean Architecture | Dependency Inversion Principle (DIP) | +| Instability direction | Martin — Clean Architecture | Stable Dependencies Principle (SDP) | +| Abstraction mismatch | Martin — Clean Architecture | Stable Abstractions Principle (SAP) | +| ISP violation | Martin — Clean Architecture | Interface Segregation Principle (ISP) | +| Conceptual integrity | Brooks — The Mythical Man-Month | Ch. 4: Conceptual Integrity | +| Law of Demeter | Hunt & Thomas — The Pragmatic Programmer | Ch. 5: Decoupling and the Law of Demeter | +| SOLID violations | Martin — Clean Architecture | Single Responsibility, Open/Closed Principles | +| Diamond dependency / upgrade blockage | Winters et al. — Software Engineering at Google | Ch. 21: Dependency Management | + +### Severity Guide + +- 🔴 Critical: dependency cycles present, or domain layer directly depends on infrastructure layer +- 🟡 Warning: several SDP or DIP violations but no cycles; conceptual inconsistency across modules +- 🟢 Suggestion: minor Demeter violations, slightly elevated fan-out in isolated modules + +### What Not to Flag + +- High fan-out in an orchestration layer or composition root is not automatically disorder +- Adapter modules may depend on both domain and infrastructure when they explicitly translate across the boundary +- A stable facade over many leaf dependencies can be healthy if dependency policy is clear + +--- + +## Risk 6: Domain Model Distortion + +**Diagnostic question:** Does the code faithfully represent the problem it is solving? + +Code that mismatches business language forces mental translation. Over time it models schemas instead of the domain, with logic bleeding into service layers. + +### Symptoms + +- Business logic scattered across service layers while domain objects have only getters and setters + (anemic domain model) +- Code variable, class, or method names that do not match what business stakeholders call the concept +- A class whose only purpose is to hold data with no behavior (pure data bag) +- A subclass that ignores or overrides most of its parent's behavior (refuses the inheritance) +- Bounded context boundaries crossed without any translation or anti-corruption layer +- Methods that are more interested in the data of another class than their own + (domain logic in the wrong place) +- A subclass overrides most parent methods with incompatible behavior or throws exceptions + where the parent contract guarantees success (LSP violation: substitution breaks callers) +- Value Objects treated as Entities: a concept defined entirely by its attributes (e.g., Money, + Email, Address) is given a mutable ID and lifecycle instead of being replaced when changed + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Anemic Domain Model | Evans — Domain-Driven Design | Domain Model pattern | +| Ubiquitous Language drift | Evans — Domain-Driven Design | Ubiquitous Language | +| Bounded context violation | Evans — Domain-Driven Design | Bounded Context | +| Data Class | Fowler — Refactoring | Data Class | +| Refused Bequest | Fowler — Refactoring | Refused Bequest | +| Feature Envy | Fowler — Refactoring | Feature Envy | +| LSP violation | Martin — Clean Architecture | Liskov Substitution Principle (LSP) | + +### Severity Guide + +- 🔴 Critical: domain logic entirely in service layer, domain objects are pure data bags with no behavior +- 🟡 Warning: partial anemia, some naming inconsistency between code and domain language +- 🟢 Suggestion: minor naming drift in non-core areas, isolated cases of Feature Envy + +### What Not to Flag + +- CRUD-heavy workflows may legitimately use transaction scripts instead of rich domain objects +- DTOs, persistence records, and API payload models are allowed to be data-only +- Shared infrastructure language should not be mistaken for domain drift if the business model itself is simple diff --git a/skills/thirdparty/_shared/remedy-guide.md b/skills/thirdparty/_shared/remedy-guide.md new file mode 100644 index 00000000..2bde96e5 --- /dev/null +++ b/skills/thirdparty/_shared/remedy-guide.md @@ -0,0 +1,37 @@ +# Remedy Guide — Actionable Fix Mode + +When `--fix` is active, enhance every finding's Remedy field to be directly actionable: + +## Remedy Enhancement Rules + +For each finding, the Remedy must include: +1. **Target**: exact file path and function/class name +2. **Action**: specific refactoring operation (e.g., "Extract lines 45-67 into a new + function `calculateShippingCost(items, config)`") +3. **Rationale**: one sentence explaining why this specific fix (not just "refactor") + +## Fixability Classification + +Classify each finding after writing the enhanced Remedy: + +| Tier | Criteria | Report label | +|------|---------|-------------| +| Quick fix | Single-file, mechanical: rename, extract constant, reorder imports | `[quick-fix]` | +| Guided fix | Requires a design choice: where to split, what interface shape | `[guided]` | +| Manual | Cross-module, needs domain knowledge or team discussion | `[manual]` | + +Append the label to the finding title: `**R1 — Long function in OrderService [quick-fix]**` + +## Output Addition + +After the standard report, add a **Fix Summary** section: + +| Finding | Tier | Target File | Action | +|---------|------|------------|--------| +| R1 — Long function | quick-fix | src/order.ts:45 | Extract `calculateTotal()` | +| R5 — Circular dep | manual | src/models/ ↔ src/services/ | Introduce interface boundary | + +## What NOT to do +- Do NOT modify any files. Phase 1 is diagnosis + actionable plan only. +- Do NOT generate diffs or code blocks. The Remedy text IS the deliverable. +- Do NOT re-score. The Health Score reflects current state, not projected state. diff --git a/skills/thirdparty/_shared/source-coverage.md b/skills/thirdparty/_shared/source-coverage.md new file mode 100644 index 00000000..3bec8418 --- /dev/null +++ b/skills/thirdparty/_shared/source-coverage.md @@ -0,0 +1,248 @@ +--- +books: + - The Mythical Man-Month + - Code Complete + - Refactoring + - Clean Architecture + - The Pragmatic Programmer + - Domain-Driven Design + - A Philosophy of Software Design + - Software Engineering at Google + - xUnit Test Patterns + - The Art of Unit Testing + - Working Effectively with Legacy Code + - How Google Tests Software +--- + +# Source Coverage Matrix + +Use this file after selecting a mode and before writing findings. +It exists to prevent shallow "book-name citation" reviews. + +## Review Discipline + +- Cite a book only when the observed symptom actually matches that book's principle. +- A threshold crossing is a hint, not a verdict. Check context, intent, and blast radius. +- Look for justified tradeoffs before flagging a smell as debt. +- Prefer concrete architectural or domain consequences over abstract style complaints. +- If two books pull in different directions, state the tradeoff instead of pretending there is no tension. + +--- + +## Frederick Brooks — *The Mythical Man-Month* + +**Encoded today** +- Change propagation as communication overhead +- Second-System Effect +- Conceptual Integrity + +**Do not ignore** +- Whether the design shows a single coherent idea or competing local optimizations +- Whether cross-team coordination cost is becoming part of feature cost + +**Do not over-flag** +- Large systems are not automatically second systems +- Multi-module designs are acceptable when they preserve conceptual integrity + +--- + +## Steve McConnell — *Code Complete* + +**Encoded today** +- Routine length, nesting, naming, and magic numbers +- Construction-phase YAGNI checks +- Defensive programming and error-handling discipline (guard clauses, input validation, + explicit error paths, assertions for invariants) + +**Do not ignore** +- Whether low-level readability choices compound into operational risk +- Whether missing error handling makes failure modes invisible to maintainers + +**Do not over-flag** +- Small, explicit guard clauses are not cognitive overload +- A long routine may be acceptable when it is linear, well-named, and single-purpose + +--- + +## Martin Fowler — *Refactoring* + +**Encoded today** +- Long Method, Long Parameter List, Message Chains +- Shotgun Surgery, Divergent Change, Feature Envy, Inappropriate Intimacy +- Duplicate Code, Speculative Generality, Lazy Class, Middle Man, Data Class +- Flag Arguments: boolean parameters that split a function into two behaviors +- Primitive Obsession: domain concepts expressed as raw primitive types instead of value types + +**Do not ignore** +- Whether the code smell is local or systemic +- Whether a refactoring target has a natural home in the model + +**Do not over-flag** +- Temporary duplication during an active extraction is not always debt +- A data-focused structure is acceptable when it is intentionally a DTO or boundary record + +--- + +## Robert C. Martin — *Clean Architecture* + +**Encoded today** +- DIP, ADP, SDP, SAP, and layering direction +- ISP: fat interfaces that force callers to depend on methods they do not use +- LSP: subclasses that break the behavioral contract of their parent type +- SRP and OCP: classes with multiple reasons to change; modules closed to modification + but open to extension via abstraction + +**Do not ignore** +- Policy vs detail boundaries +- Whether dependency arrows preserve replaceability and testability + +**Do not over-flag** +- Composition roots may depend on concrete infrastructure by design +- Thin adapter layers can import both directions when they are explicitly boundary glue + +--- + +## Andrew Hunt & David Thomas — *The Pragmatic Programmer* + +**Encoded today** +- Orthogonality +- DRY +- Law of Demeter + +**Do not ignore** +- Whether knowledge duplication is really duplicated decision-making +- Whether coupling is accidental or a deliberate local simplification + +**Do not over-flag** +- Similar code in different bounded contexts is not automatically a DRY violation +- Direct object access inside a cohesive aggregate is not always a Demeter problem + +--- + +## Eric Evans — *Domain-Driven Design* + +**Encoded today** +- Ubiquitous Language +- Bounded Context +- Anemic Domain Model +- Entity vs Value Object: objects with identity and lifecycle vs. objects defined solely by + their attributes (Money, Email, Address should be immutable value types, not mutable entities) +- Aggregate Roots: who owns the invariant boundary; cross-aggregate access only through the root + +**Do not ignore** +- Aggregate boundaries, invariant ownership, and anti-corruption layers +- Whether names match the business language used by experts + +**Do not over-flag** +- CRUD-heavy workflows may legitimately use transaction scripts +- Thin entities are acceptable when the domain itself is simple + +--- + +## John Ousterhout — *A Philosophy of Software Design* + +**Encoded today** +- Deep vs shallow modules +- Strategic vs tactical programming +- Information Leakage: a design decision encoded in more than one module, creating + change coupling even when no explicit import exists between the modules + +**Do not ignore** +- Interface complexity relative to hidden complexity +- Whether repeated tactical patches are raising long-term cognitive load +- Whether a "helper" exposes internal design decisions that callers should not know + +**Do not over-flag** +- Internal implementation complexity is fine when the interface stays simple +- A small wrapper is acceptable when it meaningfully absorbs volatility + +--- + +## Titus Winters, Tom Manshreck, Hyrum Wright — *Software Engineering at Google* + +**Encoded today** +- Hyrum's Law +- Dependency management and upgrade blockage +- Code sustainability: whether code as written can be maintained, migrated, and upgraded + over a multi-year horizon without heroic effort +- Backward compatibility: whether API changes preserve existing callers or force + coordinated upgrades across the organization + +**Do not ignore** +- De facto APIs created by observable behavior +- The maintenance cost of exposing too much surface area +- Whether the dependency graph will allow independent upgrades over time + +**Do not over-flag** +- A stable public API is not a liability if it is intentionally supported +- Fan-out alone is not disorder when dependency policy is explicit and governed + +--- + +## Gerard Meszaros — *xUnit Test Patterns* + +**Encoded today** +- Assertion Roulette, Mystery Guest, General Fixture +- Eager Test, Lazy Test, Test Code Duplication, Behavior Verification +- Erratic Test: tests that produce non-deterministic results due to shared state, + time dependence, or ordering assumptions between tests + +**Do not ignore** +- Whether test failures are diagnosable +- Whether the suite shape amplifies maintenance cost + +**Do not over-flag** +- Multiple assertions are acceptable when they express one behavior with one failure story +- Shared fixtures are acceptable when every field is relevant to the scenario + +--- + +## Roy Osherove — *The Art of Unit Testing* + +**Encoded today** +- Test naming discipline +- Test isolation +- Mock usage guidelines +- Completeness of edge-path tests + +**Do not ignore** +- Whether tests verify behavior rather than wiring +- Whether seams are used to simplify tests, or production code is being contorted for testability + +**Do not over-flag** +- A mock is acceptable when the dependency is nondeterministic and the assertion still verifies behavior +- Naming conventions are guidance; clarity is the goal + +--- + +## Michael Feathers — *Working Effectively with Legacy Code* + +**Encoded today** +- Legacy code as code without tests +- Sensing and Separation +- Seams +- Characterization Tests + +**Do not ignore** +- Whether the team can change a risky area safely today +- Whether the code offers any seam for isolating behavior under change + +**Do not over-flag** +- Untested code is not automatically legacy if it is stable and not under active change +- Characterization tests are most important before modifying unclear existing behavior + +--- + +## Google Engineering — *How Google Tests Software* + +**Encoded today** +- Change coverage vs line coverage +- Pyramid shape and suite portfolio economics + +**Do not ignore** +- Whether the suite reflects business risk, not just percentages +- Whether expensive tests dominate feedback loops + +**Do not over-flag** +- A non-70:20:10 ratio can be healthy when justified by platform constraints or product risk +- High coverage is useful when paired with meaningful branch and change protection diff --git a/skills/thirdparty/_shared/test-decay-risks.md b/skills/thirdparty/_shared/test-decay-risks.md new file mode 100644 index 00000000..ba551a98 --- /dev/null +++ b/skills/thirdparty/_shared/test-decay-risks.md @@ -0,0 +1,246 @@ +# Test Decay Risk Reference + +Six patterns that cause test suites to degrade. Apply the Iron Law to each finding. + +--- + +## Risk T1: Test Obscurity + +**Diagnostic question:** How much effort does it take to understand what this test verifies? + +Unclear test intent breeds distrust, missed failures, and duplicates — one step from an abandoned suite. + +### Symptoms + +- Assertion Roulette: multiple assertions with no message string — when one fails, it is + impossible to determine which behavior broke without reading every assertion +- Mystery Guest: test depends on external state (files, database rows, shared fixtures) + that is not visible in the test body +- Test names that do not express the scenario and expected outcome + (e.g., `test1`, `shouldWork`, `testLogin`, `testUserService`) +- General Fixture: an oversized setUp or beforeEach shared by unrelated tests, making + each test's preconditions invisible +- Test body requires reading production code to understand what is being verified + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Assertion Roulette | Meszaros — xUnit Test Patterns | Assertion Roulette (p.224) | +| Mystery Guest | Meszaros — xUnit Test Patterns | Mystery Guest (p.411) | +| General Fixture | Meszaros — xUnit Test Patterns | General Fixture (p.316) | +| Test naming | Osherove — The Art of Unit Testing | method_scenario_expected naming convention | + +### Severity Guide + +- 🔴 Critical: no test name in the file describes the behavior being tested; all assertions lack messages +- 🟡 Warning: multiple Mystery Guests; several ambiguous test names +- 🟢 Suggestion: minor naming issues; isolated General Fixture + +### What Not to Flag + +- Multiple assertions are acceptable when they describe one coherent behavior and fail with a clear story +- Shared setup is fine when every initialized value is relevant to nearly every test +- Concise test names are acceptable if scenario and expected outcome are still obvious + +--- + +## Risk T2: Test Brittleness + +**Diagnostic question:** Do tests break when you refactor without changing behavior? + +Brittle tests punish refactoring — eventually developers stop refactoring and the codebase stagnates to protect the suite. + +### Symptoms + +- Tests assert on private method results, internal state, or implementation details + rather than observable behavior +- Eager Test: one test method verifies multiple unrelated behaviors; any single change + causes it to fail regardless of which behavior was touched +- Over-specified: assertions enforce mock call order or exact parameter values that are + irrelevant to the behavior being tested +- Renaming or extracting a method causes 5 or more tests to fail even though no behavior changed +- Erratic Test: a test produces different results across runs without any change to + production code — caused by race conditions, time-dependent logic, random data, or + shared mutable state between tests + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Eager Test | Meszaros — xUnit Test Patterns | Eager Test (p.228) | +| Erratic Test | Meszaros — xUnit Test Patterns | Erratic Test | +| Implementation coupling | Osherove — The Art of Unit Testing | Test isolation principle | +| Orthogonality violation | Hunt & Thomas — The Pragmatic Programmer | Ch. 2: Orthogonality | + +### Severity Guide + +- 🔴 Critical: refactoring with no behavior change causes test failures; > 5 tests coupled to a single implementation detail +- 🟡 Warning: Eager Tests common across the suite; moderate implementation-detail assertions +- 🟢 Suggestion: isolated over-specification in non-critical tests + +### What Not to Flag + +- Verifying an externally observable event or emitted command is not implementation coupling +- One test with several assertions is acceptable when all assertions support one behavior claim +- A fake or in-memory adapter is not brittleness if the test still asserts behavior, not wiring + +--- + +## Risk T3: Test Duplication + +**Diagnostic question:** Is the same test scenario expressed in more than one place? + +Duplicated tests must change in multiple places and create false confidence without testing distinct behavior. + +### Symptoms + +- Test Code Duplication: same setup or assertion logic copy-pasted across multiple tests + without extraction into a shared helper +- Lazy Test: multiple tests verifying identical behavior with no differentiation in input, + state, or expected output +- Same boundary condition tested identically at unit, integration, and E2E level — + three copies with no layer differentiation +- Test helper functions or fixtures duplicated across test files instead of shared + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Test Code Duplication | Meszaros — xUnit Test Patterns | Test Code Duplication (p.213) | +| Lazy Test | Meszaros — xUnit Test Patterns | Lazy Test (p.232) | +| DRY violation in tests | Hunt & Thomas — The Pragmatic Programmer | DRY: Don't Repeat Yourself | + +### Severity Guide + +- 🔴 Critical: core business scenario fully duplicated across all three test layers with no differentiation +- 🟡 Warning: common scenario setup repeated in 5 or more tests without extraction +- 🟢 Suggestion: minor helper duplication; isolated Lazy Tests + +### What Not to Flag + +- The same scenario may appear at unit and integration level when each layer verifies a distinct risk +- Small local setup duplication can be clearer than an over-abstracted fixture maze +- Similar assertions against different domain rules are not Lazy Tests if the business intent differs + +--- + +## Risk T4: Mock Abuse + +**Diagnostic question:** Is the test more complex than the behavior it tests? + +Mock abuse produces tests that pass while verifying nothing — production code can be fully broken as long as the mocks are wired up. + +### Symptoms + +- Mock setup code is longer than the test logic itself +- Primary assertion is `expect(mock).toHaveBeenCalledWith(...)` — the test verifies + that a mock was called, not that any real behavior occurred +- Test-only methods added to production classes for lifecycle management in tests +- Single unit test uses more than 3 mocks +- Incomplete Mock: mock object missing fields that downstream code will access, + causing silent failures only visible in integration +- Hard-Coded Test Data: test data has no resemblance to real data shapes or constraints + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Mock count > 3 | Osherove — The Art of Unit Testing | Mock usage guidelines | +| Testing mock behavior | Meszaros — xUnit Test Patterns | Behavior Verification (p.544) | +| Test-only production methods | Feathers — Working Effectively with Legacy Code | Ch. 3: Sensing and Separation | +| Hard-Coded Test Data | Meszaros — xUnit Test Patterns | Hard-Coded Test Data (p.534) | +| Incomplete Mock | Osherove — The Art of Unit Testing | Mock completeness requirement | + +### Severity Guide + +- 🔴 Critical: mock setup > 50% of test code; production class has methods only called from tests +- 🟡 Warning: mocks consistently > 3 per test; primary assertions are mock call verifications +- 🟢 Suggestion: isolated Incomplete Mocks; minor Hard-Coded Test Data + +### What Not to Flag + +- A small number of mocks around nondeterministic dependencies is acceptable when assertions still verify behavior +- Fakes and spies used to observe state transitions are not mock abuse by default +- One interaction assertion may be appropriate when the interaction itself is the behavior under test + +--- + +## Risk T5: Coverage Illusion + +**Diagnostic question:** Does the test suite actually protect against the failures that matter? + +Coverage measures execution, not verification. 90% line coverage can still miss every critical failure mode — teams stop looking because the number says "covered." + +### Symptoms + +- High line coverage but error-handling branches, boundary conditions, and exception paths + have no corresponding tests +- Happy-path only: no sad paths, no null/empty/zero inputs, no concurrency edge cases +- Legacy code areas are being actively modified with no tests present + (Feathers: "legacy code is code without tests") +- Coverage percentage treated as a sign-off criterion; critical change paths remain untested +- Tests assert on return values but not on important side effects such as database writes, + event publications, or state transitions + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Legacy code = no tests | Feathers — Working Effectively with Legacy Code | Ch. 1: "Legacy code is code without tests" | +| Change coverage vs line coverage | Google — How Google Tests Software | Ch. 11: Testing at Google Scale | +| Happy-path only | Osherove — The Art of Unit Testing | Test completeness principle | + +### Severity Guide + +- 🔴 Critical: legacy code area actively being modified with no tests; error-handling paths entirely absent +- 🟡 Warning: coverage > 80% but edge and exception paths are systematically absent +- 🟢 Suggestion: a few non-critical paths missing sad-path tests + +### What Not to Flag + +- High line coverage is useful when paired with branch, boundary, and change-path coverage +- A new module may have limited coverage early if it is still private and low-risk +- Side-effect assertions may live in integration tests rather than unit tests without implying a gap + +--- + +## Risk T6: Architecture Mismatch + +**Diagnostic question:** Does the test suite structure reflect the system's actual risk profile? + +Wrong suite shape is slow and expensive — not from bad tests, but from using the wrong type at the wrong layer. + +### Symptoms + +- Inverted test pyramid: E2E or integration test count exceeds unit test count, + causing a slow and fragile suite +- Legacy code with no seam points: no interfaces, dependency injection, or seams exist, + making it impossible to test in isolation without modifying production code +- Legacy areas being modified have no Characterization Tests to capture current behavior + before changes are made +- Full suite execution time exceeds 10 minutes (indicates architectural problem, + not a performance problem — too many slow tests) +- High-risk and low-risk paths are tested at identical density; + no risk-based prioritization in test distribution + +### Sources + +| Symptom | Book | Principle / Smell | +|---------|------|-------------------| +| Inverted pyramid | Google — How Google Tests Software | 70:20:10 unit:integration:E2E ratio | +| No seam points | Feathers — Working Effectively with Legacy Code | Ch. 4: Seam Model | +| Missing Characterization Tests | Feathers — Working Effectively with Legacy Code | Ch. 13: Characterization Tests | +| Suite execution time | Meszaros — xUnit Test Patterns | Slow Tests (p. 253) | + +### Severity Guide + +- 🔴 Critical: legacy code being modified has no seams and no characterization tests; pyramid fully inverted +- 🟡 Warning: suite execution > 10 minutes; integration/E2E count exceeds unit tests +- 🟢 Suggestion: localized pyramid ratio deviation; a few legacy areas missing characterization tests + +### What Not to Flag + +- Deviating from 70:20:10 can be justified by platform constraints or product risk +- A suite heavy on integration tests can still be healthy if feedback is fast and purposefully layered +- A small number of critical-path E2E tests is desirable, not a smell diff --git a/skills/thirdparty/brooks-audit/SKILL.md b/skills/thirdparty/brooks-audit/SKILL.md new file mode 100644 index 00000000..74264002 --- /dev/null +++ b/skills/thirdparty/brooks-audit/SKILL.md @@ -0,0 +1,42 @@ +--- +name: brooks-audit +description: > + Architecture audit that maps module dependencies, checks layering integrity, and + flags structural decay across a codebase, drawing on twelve classic engineering books. + Triggers when: user asks to audit architecture, review folder/module structure, + check for circular imports, understand how the codebase is organized, or asks + "does this follow clean architecture?", "why does everything depend on everything?", + "are our layers correct?", "where should this code live?". + Also triggers for onboarding requests: "explain this codebase to a new developer" + or "give me a codebase tour" (use onboarding mode). + Do NOT trigger for: PR-level code review (use brooks-review) or line-level refactoring + questions — this skill analyzes structural/module-level concerns, not individual functions. +--- + +# Brooks-Lint — Architecture Audit + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions +4. Read `architecture-guide.md` in this directory for the audit framework + +## Process + +**Onboarding mode:** If the user asks for an onboarding report, codebase tour, or +"explain this codebase to a new developer", read `onboarding-guide.md` from this +directory and follow it instead of `architecture-guide.md`. This mode explains rather +than diagnoses — no Health Score, no Iron Law findings. + +**If the user has not specified files or a directory to audit:** apply Auto Scope +Detection from `../_shared/common.md` to determine the audit scope before proceeding. + +1. Gather codebase context and draw the module dependency graph as Mermaid (Steps 0–1 of the guide) +2. Scan for each decay risk in the order specified (Steps 2–4 of the guide) +3. Assign node colors in the Mermaid diagram based on findings (red/yellow/green) — after Step 4 +4. Run the Testability Seam Assessment (Step 5 of the guide) +5. Run the Conway's Law check (Step 6 of the guide) +6. Output using the Report Template from common.md — Mermaid graph FIRST, then Findings + +**Mode line in report:** `Architecture Audit` diff --git a/skills/thirdparty/brooks-audit/architecture-guide.md b/skills/thirdparty/brooks-audit/architecture-guide.md new file mode 100644 index 00000000..09cd9ff8 --- /dev/null +++ b/skills/thirdparty/brooks-audit/architecture-guide.md @@ -0,0 +1,195 @@ +# Architecture Audit Guide — Mode 2 + +**Purpose:** Analyze the module and dependency structure of a system for decay risks that +operate at the architectural level. Every finding must follow the Iron Law: +Symptom → Source → Consequence → Remedy. + +**Monorepo note:** Treat each deployable service or library as a top-level module. Draw +dependencies between services, not between their internal packages. Apply the Conway's Law +check at the service ownership level. Within a single service, apply standard module-level analysis. + +--- + +## Analysis Process + +Work through these six steps in order. + +### Step 0: Gather Codebase Context + +Before drawing anything, establish what you can see. + +**If the user provided a full directory tree or pasted relevant file contents:** skip the +proactive reading below and proceed to Step 1. + +**Otherwise, proactively read the project using these tools:** + +1. **Top-level structure** — glob top two levels to identify module boundaries: + ``` + Glob: **/*(depth 2, directories only) + ``` +2. **Entry points** — read the package manifest or main config file (e.g., `package.json`, + `go.mod`, `pom.xml`, `Cargo.toml`, `pyproject.toml`) to confirm language, framework, + and declared dependencies. +3. **Dependency edges** — grep import statements to discover inter-module calls. Run once + per language present; limit to the first 200 matches to avoid token overrun: + ``` + Grep: "^\s*(import|from|require\(|use )" across *.ts|*.py|*.go|*.rs|*.java + ``` +4. **Large modules** — for any top-level directory with > 10 files, read the file matching + `index.*`, `main.*`, or `__init__.*` to understand its stated responsibility. + +**Stop when you can answer all three:** +- What are the top-level modules (names and count)? +- Which modules import from which other modules? +- Which module has the highest fan-in or fan-out? + +If the project has > 100 top-level files or > 4 levels of nesting, note which areas were +sampled vs. inferred, and flag this in the report scope line. + +### Step 1: Draw the Module Dependency Graph (Mermaid) + +Before evaluating any risk, map the dependencies as a Mermaid diagram. Use this format: + +````mermaid +graph TD + subgraph UI + WebApp + MobileApp + end + + subgraph Domain + AuthService + OrderService + PaymentService + end + + subgraph Infrastructure + Database + MessageQueue + end + + WebApp --> AuthService + WebApp --> OrderService + MobileApp --> AuthService + MobileApp --> OrderService + OrderService --> PaymentService + OrderService --> Database + OrderService --> MessageQueue + PaymentService --> Database + AuthService -.->|circular| OrderService + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#e67700 + classDef clean fill:#51cf66,stroke:#2b8a3e,color:#fff + + class PaymentService critical + class OrderService warning + class Database,MessageQueue,AuthService,WebApp,MobileApp clean +```` + +Draw the graph structure first — nodes, subgraphs, and edges — without any `classDef` or +`class` lines. You cannot assign colors until you have completed the risk scan in Steps 2–4. + +**After completing Step 4**, return to this graph and add the `classDef` and `class` lines +based on findings. The example above shows the final colored output. + +Rules: +1. **Nodes** — Use top-level directories or services as nodes, not individual files +2. **Grouping** — One `subgraph` per architectural layer or top-level directory (e.g., UI, Domain, Infrastructure) +3. **Edges** — Solid arrows (`-->`) point FROM the depending module TO the dependency; use dotted arrows with label (`-.->|circular|`) for circular dependencies. If no circular dependencies exist, use only solid arrows +4. **Node limit** — Keep the graph to ~50 nodes maximum; collapse low-risk leaf modules into their parent if needed +5. **Fan-out** — For any node with fan-out > 5, use a descriptive label: `HighFanOutModule["ModuleName (fan-out: 7)"]` +6. **Colors** — Apply `classDef` colors AFTER completing Steps 2-4: `critical` (red `#ff6b6b`) for nodes with Critical findings, `warning` (yellow `#ffd43b`) for Warning findings, `clean` (green `#51cf66`) for nodes with no findings or only Suggestions. If no findings at all, classify all nodes as `clean` +7. **Direction** — Default to `graph TD` (top-down); use `graph LR` only if the architecture is clearly a left-to-right pipeline + +### Step 2: Scan for Dependency Disorder + +*The most architecturally consequential risk — scan this first.* + +Look for: +- Circular dependencies (any `-.->|circular|` edge in the map above) +- Arrows flowing upward (high-level domain depending on low-level infrastructure) +- Stable, widely-depended-on modules that import from frequently-changing modules +- Modules with fan-out > 5 +- Absence of a clear layering rule (no consistent answer to "what depends on what?") + +### Step 3: Scan for Domain Model Distortion + +Look for: +- Do module names match the business domain vocabulary? +- Is there a layer called "services" that contains all the business logic while domain objects + are pure data structures? +- Are there modules that cross bounded context boundaries (e.g., billing logic in the user module)? +- Is there an anti-corruption layer where external systems interface with the domain? + +### Step 4: Scan for Remaining Four Risks + +Check each in turn: + +**Knowledge Duplication:** +- Are there multiple modules implementing the same concept independently? +- Does the same domain concept appear under different names in different modules? + +**Accidental Complexity:** +- Are there entire layers in the architecture that do not add value? +- Are there modules whose responsibility cannot be stated in one sentence? + +**Change Propagation:** +- Which modules are "blast radius hotspots"? (A change here requires changes in many other modules) +- Does the dependency map reveal why certain features are slow to develop? + +**Cognitive Overload:** +- Can the module responsibility of each module be stated in one sentence from its name alone? +- Would a new developer know which module to add a new feature to? + +### Step 5: Testability Seam Assessment + +A *seam* is a place in the architecture where behavior can be altered without editing source +code — typically an interface, a configuration point, or a dependency injection boundary. +Seam density is a proxy for testability and evolvability. + +Scan for: +- **No seam at the infrastructure boundary**: can you replace a real database, file system, + or HTTP client with a test double without editing the module under test? If not, the + architecture forces integration tests where unit tests would suffice. +- **Seam collapse**: a module that was once testable in isolation has had its seams removed + (e.g., direct constructor instantiation replaced a dependency injection point, or a global + singleton replaced an injected collaborator). +- **Missing seam in legacy areas**: modules without an obvious injection point or interface + boundary — any change requires touching the entire call stack to substitute behavior. + +If all modules have clear seams at their infrastructure boundaries → no finding. + +If seams are absent or collapsed: flag as 🟡 Warning with a Remedy pointing to the specific +module and the injection point that needs to be restored or introduced. + +Source: Feathers — Working Effectively with Legacy Code, Ch. 4: The Seam Model + +### Step 6: Conway's Law Check + +After the six-risk scan, assess the relationship between architecture and team structure: + +- Does the module/service structure reflect the team structure? + (Conway's Law: "Organizations design systems that mirror their communication structure") +- If yes: is this intentional design or accidental coupling? +- A mismatch that causes cross-team coordination overhead for every feature is 🔴 Critical. +- A mismatch that is theoretical but not yet causing pain is 🟡 Warning. +- If team structure is unknown, note this as context missing and skip the check. + +**Calibration examples:** +- 🔴 Critical: the Payments module is owned by Team A but contains auth logic owned by Team B — + every Payments change requires a sync meeting with Team B +- 🟡 Warning: two separate teams own the `utils/` and `helpers/` directories which do the same + things — theoretically painful but not yet causing release coordination issues +- Not a finding: a single team owns a monorepo with multiple logical modules — Conway's Law + misalignment requires *separate teams* to be meaningful + +--- + +## Output + +Use the standard Report Template from `../_shared/common.md`. Mode: Architecture Audit. + +Place the Mermaid dependency graph FIRST under "Module Dependency Graph". Reference +relevant node names in findings. Add `classDef` color assignments LAST, after all +findings are identified. diff --git a/skills/thirdparty/brooks-audit/onboarding-guide.md b/skills/thirdparty/brooks-audit/onboarding-guide.md new file mode 100644 index 00000000..742292b2 --- /dev/null +++ b/skills/thirdparty/brooks-audit/onboarding-guide.md @@ -0,0 +1,89 @@ +# Codebase Onboarding Guide + +**Purpose:** Produce a newcomer-friendly tour of the codebase. This is NOT a diagnostic +report — no Health Score, no Iron Law findings. Focus on explanation and orientation. + +--- + +## Process + +### Step 1: Map the Territory + +- Read top-level structure (same as architecture-guide Step 0) +- Output: a plain-language overview of what each top-level module does (one sentence each) +- Group into layers: "Things users interact with", "Business logic", "Infrastructure" + +### Step 2: Draw the Dependency Map + +Draw the same Mermaid dependency graph as architecture audit Step 1, but color nodes by +**recommended reading order** using a DISTINCT palette from the severity palette +(which uses red/yellow/green). This avoids confusing "red = danger" with "red = read last": + +- 🔵 Blue (`#339af0`): start here — entry points, core domain +- 🟣 Purple (`#9775fa`): read next — supporting modules +- ⚪ Gray (`#ced4da`): read last — infrastructure, generated code, utilities + +Add numbered labels: `CoreModule["1. CoreModule"]` + +``` +classDef start fill:#339af0,color:#fff +classDef next fill:#9775fa,color:#fff +classDef last fill:#ced4da +``` + +### Step 3: Highlight Key Conventions + +Identify and document patterns the codebase follows: +- Naming conventions (file naming, class naming, variable naming) +- Directory organization pattern (feature-based? layer-based? hybrid?) +- Error handling pattern (exceptions? result types? error codes?) +- Testing convention (co-located? separate directory? naming pattern?) +- Dependency injection pattern (if any) + +### Step 4: Mark Danger Zones + +For each module with known complexity or coupling issues, add a brief warning: +- "OrderService: high complexity, only modify with full test suite running" +- "legacy/: no tests, use Characterization Tests before changing" + +Do NOT use Iron Law format — use plain warnings. This is orientation, not diagnosis. + +### Step 5: Build a Domain Glossary + +Extract 10-15 key domain terms from code (class names, method names, constants) and map +them to plain-language definitions. This applies Evans's Ubiquitous Language as documentation. + +### Step 6: Suggest First Tasks + +Based on the dependency map, suggest 2-3 low-risk areas where a new developer could make +their first contribution: modules with good test coverage, clear boundaries, low coupling. + +--- + +## Output Template + +``` +# Codebase Tour: [Project Name] + +## Overview +[2-3 sentence summary of what the project does and its tech stack] + +## Module Map +[Mermaid graph with reading-order colors] + +## Module Guide +[One paragraph per top-level module: what it does, what it depends on, key files to read] + +## Conventions +[Bullet list of patterns this codebase follows] + +## Danger Zones +[Bullet list of areas to be careful with, or "None identified" if the codebase is clean] + +## Domain Glossary +| Term | Meaning | +|------|---------| + +## Suggested First Tasks +[2-3 concrete suggestions for a new developer's first PR] +``` diff --git a/skills/thirdparty/brooks-debt/SKILL.md b/skills/thirdparty/brooks-debt/SKILL.md new file mode 100644 index 00000000..ff68d197 --- /dev/null +++ b/skills/thirdparty/brooks-debt/SKILL.md @@ -0,0 +1,35 @@ +--- +name: brooks-debt +description: > + Tech debt assessment that identifies, classifies, and prioritizes maintainability + problems — helping teams build a refactoring roadmap — drawing on twelve classic + engineering books. + Triggers when: user asks about tech debt, refactoring priorities, what to clean up + first, or asks "why is this so hard to change?", "where's the most painful part?", + "what should we fix first?", "how do I justify refactoring to management?", + "why is our velocity dropping?". + Do NOT trigger for: server health checks, HTTP /health endpoints, Kubernetes probes, + database health, or application uptime — "health" in those contexts is infrastructure, + not code quality. Also not for single-function refactoring questions. +--- + +# Brooks-Lint — Tech Debt Assessment + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions +4. Read `debt-guide.md` in this directory for the debt classification framework + +## Process + +**If the user has not described the codebase or pointed to specific areas:** apply Auto +Scope Detection from `../_shared/common.md` to determine the assessment scope before proceeding. + +1. Scan for all six decay risks (Step 1 of the guide); list every finding before scoring +2. Apply the Pain × Spread priority formula and classify debt intent (Steps 2–3 of the guide) +3. Group findings by decay risk (Step 4 of the guide) +4. Output using the Report Template from common.md, plus the Debt Summary Table + +**Mode line in report:** `Tech Debt Assessment` diff --git a/skills/thirdparty/brooks-debt/debt-guide.md b/skills/thirdparty/brooks-debt/debt-guide.md new file mode 100644 index 00000000..65ee8e54 --- /dev/null +++ b/skills/thirdparty/brooks-debt/debt-guide.md @@ -0,0 +1,125 @@ +# Tech Debt Assessment Guide — Mode 3 + +**Purpose:** Identify, classify, and prioritize technical debt across the entire codebase. +Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy. + +--- + +## Evidence Gathering + +If you have insufficient evidence to assess the codebase, ask the user ONE question — +choose the single question most relevant to what you already know: + +1. "Which part of the codebase takes the longest to modify for a typical feature?" +2. "Which module do developers avoid touching, and why?" +3. "Which parts of the system have the fewest tests and the most bugs?" +4. "Is there a module that only one person fully understands?" + +After one answer, proceed. Do not ask more than one question. +If the user declines or says they don't know, proceed with available evidence and note +which areas could not be assessed. + +--- + +## Analysis Process + +Work through these four steps in order. + +### Step 1: Full Decay Risk Scan + +Scan for all six decay risks across the entire codebase. List every finding before scoring +any of them. This prevents anchoring on early findings and missing systemic patterns. + +For each risk, look for: + +**Cognitive Overload:** Are there widespread naming problems, deeply nested logic, or +excessively long functions spread across many modules? + +**Change Propagation:** Which modules cause the most ripple effects when changed? +Are there modules that everyone must modify when adding a new feature? + +**Knowledge Duplication:** How many times is the same concept implemented independently? +Is the domain vocabulary consistent across the codebase? + +**Accidental Complexity:** Are there architectural layers or abstractions that add no value? +Is the infrastructure overhead proportional to the problem being solved? + +**Dependency Disorder:** Are there dependency cycles? Does domain logic depend on infrastructure? +Are there modules with no clear layering position? + +**Domain Model Distortion:** Is business logic in the right layer? +Do code names match business names? Are domain objects anemic? + +### Step 2: Score Each Finding with Pain × Spread + +After listing all findings, score each one: + +**Pain score (1–3):** How much does this slow down development today? +- 3: Developers actively avoid touching this area; it causes bugs on most changes + *(e.g., "nobody wants to touch the billing module because it always breaks something")* +- 2: This area is noticeably slower to work in than the rest of the codebase + *(e.g., "adding a field takes 2–3x longer here than elsewhere")* +- 1: This is a quality issue but not currently causing active pain + *(e.g., "inconsistent naming, but we always know what we mean")* + +**Spread score (1–3):** How many files, modules, or developers does this affect? +- 3: Affects 5+ modules or all developers on the team + *(e.g., "every new feature touches the God class in core/")* +- 2: Affects 2–4 modules or a subset of the team + *(e.g., "the auth and notification modules are tightly coupled")* +- 1: Isolated to one module or one developer's area + *(e.g., "legacy parser that only one person maintains")* + +**Priority = Pain × Spread** (max 9) + +| Priority | Classification | Action | +|----------|---------------|--------| +| 7–9 | Critical debt | Address in next sprint | +| 4–6 | Scheduled debt | Plan within quarter | +| 1–3 | Monitored debt | Log and watch | + +### Step 3: Classify Debt Intent + +After scoring, classify each finding as intentional or accidental: + +**Intentional debt** — a conscious shortcut taken to meet a deadline, with the expectation +of paying it back. The team knows about it. It may be legitimate (a strategic prototype, +a known temporary workaround during a migration). + +**Accidental debt** — degradation that accumulated without a deliberate decision: the team +did not choose it and may not even know it exists. This is the kind Ward Cunningham's +original definition warned against — not a tactical trade-off, but structural erosion. + +Mark each finding with `[intentional]` or `[accidental]` in the Debt Summary Table. +Intentional debt with no visible payback plan — no linked ticket, no code comment, no +documented decision — should be treated as accidental for prioritization purposes. +Focus remediation energy on accidental debt first; intentional debt at least has an owner. + +### Step 4: Group by Decay Risk + +Report findings grouped by risk type, not by file or module. +Grouping by risk reveals systemic patterns: +- "Change Propagation is systemic" → architectural intervention needed +- "Cognitive Overload is isolated" → localized refactoring sufficient + +--- + +## Output + +Use the standard Report Template from `../_shared/common.md`. Mode: Tech Debt Assessment. + +After Findings, append a Debt Summary Table: + +``` +## Debt Summary +| Risk | Findings | Avg Priority | Classification | Intent | +|------|----------|-------------|----------------|--------| +| Cognitive Overload | N | X.X | Monitored/Scheduled/Critical | intentional/accidental | +| Change Propagation | N | X.X | ... | ... | +| Knowledge Duplication | N | X.X | ... | ... | +| Accidental Complexity | N | X.X | ... | ... | +| Dependency Disorder | N | X.X | ... | ... | +| Domain Model Distortion | N | X.X | ... | ... | + +**Recommended focus:** [risks with highest average priority] +``` diff --git a/skills/thirdparty/brooks-health/SKILL.md b/skills/thirdparty/brooks-health/SKILL.md new file mode 100644 index 00000000..b73e5717 --- /dev/null +++ b/skills/thirdparty/brooks-health/SKILL.md @@ -0,0 +1,37 @@ +--- +name: brooks-health +description: > + Combined codebase health dashboard that scores a project across all four quality + dimensions — PR quality, architecture, tech debt, and test quality — in a single + pass, drawing on twelve classic engineering books. + Triggers when: user wants an overall quality assessment, asks "how healthy is this + codebase?", "run all the checks", "give me a big-picture quality report", "I need a + health score before the release", "what's the overall state of our code?", or wants + to onboard a new team with a quality overview. + Do NOT trigger for: server health checks, HTTP health endpoints, Kubernetes + liveness/readiness probes, database health, or application uptime. Also do not + trigger when the user specifically requests only one dimension — use the + corresponding focused skill instead (brooks-review / brooks-audit / + brooks-debt / brooks-test). +--- + +# Brooks-Lint — Health Dashboard + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/decay-risks.md` for production risk symptom definitions +4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions +5. Read `health-guide.md` in this directory for the dashboard orchestration process + +## Process + +**If the user has not specified a project or directory:** apply Auto Scope Detection +from `../_shared/common.md` to determine the review scope before proceeding. + +1. Run abbreviated scans across all four dimensions (Step 1 of the guide) +2. Compute per-dimension and composite Health Scores with weighting (Step 2 of the guide) +3. Output the Health Dashboard using the dashboard report template (Step 3 of the guide) + +**Mode line in report:** `Health Dashboard` diff --git a/skills/thirdparty/brooks-health/health-guide.md b/skills/thirdparty/brooks-health/health-guide.md new file mode 100644 index 00000000..e873262f --- /dev/null +++ b/skills/thirdparty/brooks-health/health-guide.md @@ -0,0 +1,89 @@ +# Health Dashboard Guide — Mode 5 + +**Purpose:** Produce a cross-dimensional health dashboard for the codebase. +Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy. + +--- + +## Analysis Process + +### Step 1: Run Lightweight Scan Across Four Dimensions + +For each dimension, run an abbreviated scan using the decay-risks definitions +from `_shared/`. Do NOT read the individual mode guide files — use the abbreviated +checklists below. Cap each dimension at 3 findings; for Debt: cap at 2 per risk code and 3 across all risk codes. + +**PR dimension (if changes exist):** +- Apply Auto Scope Detection (common.md) +- Scan for R2 (Change Propagation) and R1 (Cognitive Overload) in the diff + +**Architecture dimension:** +- Gather codebase context: read top-level structure, entry points, import statements +- Draw a Mermaid dependency graph (follow standard graph rules from common.md) +- Scan for R5 (Dependency Disorder): circular deps, upward flows, fan-out > 5 +- INCLUDE the Mermaid graph in output + +**Debt dimension:** +- Scan for all six decay risks (R1-R6) across the codebase +- Skip Pain × Spread scoring (use severity tier only) + +**Test dimension:** +- Build the Test Suite Map (unit/integration/E2E counts) +- Scan for T1 (Test Obscurity) and T2 (Test Brittleness) in test files + +### Step 2: Compute Dashboard Scores + +Each dimension gets its own Health Score (base 100, same deduction rules from common.md). +Composite score = weighted average of dimension scores: + +| Dimension | Weight | Rationale | +|-----------|--------|-----------| +| PR (code quality) | 0.25 | Only applies if changes exist; skip if no diff | +| Architecture | 0.30 | Structural issues have highest blast radius | +| Debt | 0.25 | Systemic but slower-moving | +| Test | 0.20 | Supporting signal | + +If PR dimension is skipped (no changes), redistribute its 0.25 weight proportionally +across the remaining three dimensions by dividing each remaining weight by +(1 − 0.25) = 0.75. Compute redistribution dynamically — do not hardcode the values. + +**Redistributed weights (PR skipped):** + +| Dimension | Base Weight | Redistributed Weight | +|-----------|------------|---------------------| +| Architecture | 0.30 | 0.30 / 0.75 = 0.40 | +| Debt | 0.25 | 0.25 / 0.75 = 0.33 | +| Test | 0.20 | 0.20 / 0.75 = 0.27 | + +### Step 3: Output Dashboard + +Use the dashboard report template below instead of the standard common.md template. + +--- + +## Dashboard Report Template + +````markdown +# Brooks-Lint Health Dashboard + +**Mode:** Health Dashboard +**Scope:** [project name or directory] +**Composite Score:** XX/100 + +| Dimension | Score | Top Finding | +|-----------|-------|------------| +| Code Quality | XX/100 | [one-line summary or "Clean"] | +| Architecture | XX/100 | [one-line summary or "Clean"] | +| Tech Debt | XX/100 | [one-line summary or "Clean"] | +| Test Quality | XX/100 | [one-line summary or "Clean"] | + +## Module Dependency Graph +[Mermaid graph from architecture scan] + +## Top Findings (max 5 across all dimensions) +[Standard Iron Law format, sorted by severity] + +## Recommendation +[One paragraph: what to fix first, which dimension needs the most attention, + suggest running the full individual skill for the worst dimension] +```` diff --git a/skills/thirdparty/brooks-review/SKILL.md b/skills/thirdparty/brooks-review/SKILL.md new file mode 100644 index 00000000..7568fc1e --- /dev/null +++ b/skills/thirdparty/brooks-review/SKILL.md @@ -0,0 +1,37 @@ +--- +name: brooks-review +description: > + PR code review that surfaces decay risks, design smells, and maintainability + issues with concrete Symptom → Source → Consequence → Remedy findings, drawing + on twelve classic engineering books. + Triggers when: user asks to review code, check a PR, shares a diff or pastes + code asking "does this look right?" / "any issues here?" / "ready to merge?", + or asks for feedback on a function, class, or file. + Also triggers when user mentions: code smells / refactoring / clean architecture / + DDD / domain-driven design / SOLID principles / Hyrum's Law / deep modules / + tactical programming / conceptual integrity / Brooks's Law / Mythical Man-Month / + second system effect. + Do NOT trigger for: questions about how to write code from scratch, language syntax + questions, or framework/tool questions where no existing code is shared. +--- + +# Brooks-Lint — PR Review + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/decay-risks.md` for symptom definitions and source attributions +4. Read `pr-review-guide.md` in this directory for the analysis process + +## Process + +**If the user has not specified files or pasted code:** apply Auto Scope Detection +from `../_shared/common.md` to determine the review scope before proceeding. + +1. Understand the review scope, then scan for each decay risk in the order specified (Steps 1–6 of the guide) +2. Run the Quick Test Check (Step 7 of the guide) — skip for docs-only or non-production changes +3. Apply the Iron Law to every finding +4. Output using the Report Template from common.md + +**Mode line in report:** `PR Review` diff --git a/skills/thirdparty/brooks-review/pr-review-guide.md b/skills/thirdparty/brooks-review/pr-review-guide.md new file mode 100644 index 00000000..aa0d4ae7 --- /dev/null +++ b/skills/thirdparty/brooks-review/pr-review-guide.md @@ -0,0 +1,163 @@ +# PR Review Guide — Mode 1 + +**Purpose:** Analyze a code diff or specific files for decay risks that are directly visible +in the changed code. Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy. + +--- + +## Before You Start + +**Auto-generated files:** If the diff contains generated files (protobuf stubs, OpenAPI clients, +ORM migrations, lock files, minified bundles), skip those files entirely. Generated code reflects +tool choices, not developer decisions. Note in the report which files were skipped and why. + +**Scope calibration:** Adjust analysis depth based on PR size before starting. + +| PR Size | Approach | +|---------|----------| +| < 50 lines | Focus on Steps 1–3 only; run Step 6a only if imports changed; run Step 6b if any class, method, or variable was renamed or introduced | +| 50–300 lines | Full process, all steps | +| > 300 lines | Full process; note in the Scope line that review is sampled — cover the highest-risk areas rather than every file | + +For PRs > 500 lines: flag in the Summary that a PR this size is itself a Change Propagation signal. A change that cannot be reviewed in one pass suggests tangled responsibilities. + +--- + +## Analysis Process + +Work through these seven steps in order. Do not skip steps. + +### Step 1: Understand the scope + +Read the diff or files and answer: +- What is the stated purpose of this change? +- Which files were modified? +- Flag immediately if the PR changes more than 10 unrelated files — that itself is a + 🟡 Warning: Change Propagation (a PR that touches many unrelated things is a sign + that responsibilities are tangled). + +### Step 2: Scan for Change Propagation + +*Scan this first — it is the most visible risk in a diff.* + +Look for: +- Does this change touch files in modules that have no conceptual connection to the stated purpose? +- Does any modified class change for more than one business reason in this diff? +- Does any method use more data from another class than from its own? + +If the diff shows no cross-module changes beyond what the feature requires → skip, no finding. + +### Step 3: Scan for Cognitive Overload + +Look for: +- Are any new or modified functions longer than 20 lines? +- Is there nesting deeper than 3 levels in new or modified code? +- Are there more than 4 parameters in any new function signature? +- Are there magic numbers or unexplained constants in new code? +- Do new variable or function names require reading the implementation to understand? +- Are there train-wreck chains (3+ method calls chained)? + +### Step 4: Scan for Knowledge Duplication + +Look for: +- Does this change introduce logic that already exists elsewhere in the codebase? +- Does this change introduce a new name for a concept that already has a name? +- Does this change add a class to a hierarchy that has a parallel in another module? + +### Step 5: Scan for Accidental Complexity + +Look for: +- Does this change add an abstraction with only one concrete use? +- Does this change add a class that only wraps another class or delegates everything? +- Does this change add configuration options or extension points that serve no current requirement? + +### Step 6a: Scan for Dependency Disorder + +- Do any new imports create a dependency from a high-level module to a low-level one? + (e.g., domain service now imports a database driver or HTTP client) +- Do any new imports introduce a cycle between modules? +- Does any new interface force callers to depend on methods they do not use? + +If no new imports and no structural changes → skip, no finding. + +### Step 6b: Scan for Domain Model Distortion + +- Do new class or variable names match the language the business uses for the same concept? +- Does any new class hold only data with no behavior (pure data bag), where behavior was expected? +- Does any new method put logic that belongs to the domain in a service or utility layer? + +--- + +## Severity Calibration + +Apply the Iron Law format from `../_shared/common.md`. Each risk in `../_shared/decay-risks.md` has its own Severity +Guide with numeric thresholds — use those as the primary reference. When a finding sits +on the boundary between two tiers, use this as a tiebreaker: +- 🔴 Critical — actively breaking velocity or creating production risk *today* +- 🟡 Warning — will if left unaddressed through the next few features +- 🟢 Suggestion — worth fixing when nearby, not urgent + +When multiple findings exist, list Critical items first. If there are more than 5 findings, +add a one-line "Recommended fix order" at the end of the Findings section. + +--- + +## Step 7: Quick Test Check + +*Run this last. Three signals only — this is not a full Mode 4 review.* + +If the diff contains only generated files, configuration, or documentation with no +production logic changes → skip Step 7 entirely. + +**Signal 1: Do tests exist for the changed behavior?** + +- Does the diff modify production code? +- Are corresponding test file changes included in the diff? +- If new public behavior was added with no new tests: + → 🟡 Warning: Coverage Illusion — new behavior is untested + → Source: Feathers — Working Effectively with Legacy Code, Ch. 1 +- If the change is a pure refactor and existing tests cover the behavior → no finding. + +**Signal 2: Quick Mock Abuse sniff** + +Only check if the diff includes test file changes. + +- Is mock setup code in new/modified tests obviously longer than the test logic? +- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` with no behavior verification? +- Does the diff add any methods to production classes that are only called from test files? + +If any of these are true: + → 🟡 Warning: Mock Abuse — test complexity exceeds behavior complexity + → Source: Osherove — The Art of Unit Testing, mock usage guidelines + +**Signal 3: Quick Test Obscurity sniff** + +Only check if the diff includes test file changes. + +- Do new test names express scenario and expected outcome? + (Pattern: `methodName_scenario_expectedResult` or equivalent) +- Are there new tests with multiple assertions and no message strings on any of them? + +If test names are vague or assertions lack messages: + → 🟢 Suggestion: Test Obscurity — test intent is unclear from the test name or assertions + → Source: Meszaros — xUnit Test Patterns, Assertion Roulette (p.224) + +**Output rule:** + +If all three signals are clean → write no Test findings. Proceed directly to the report. + +If findings exist → add them to the Findings section using the standard Iron Law format. +Label the risk as the test decay risk name (e.g., "Coverage Illusion", "Mock Abuse", +"Test Obscurity"). + +> **Note:** Step 7 is a fast check, not a full test audit. When systemic test problems +> are found, note in the Summary: "Consider running `/brooks-lint:brooks-test` for a +> complete test quality diagnosis." + +--- + +## Output + +Use the standard Report Template from `../_shared/common.md`. +Mode: PR Review +Scope: list the files reviewed (excluding skipped generated files). diff --git a/skills/thirdparty/brooks-sweep/SKILL.md b/skills/thirdparty/brooks-sweep/SKILL.md new file mode 100644 index 00000000..a8a0e500 --- /dev/null +++ b/skills/thirdparty/brooks-sweep/SKILL.md @@ -0,0 +1,38 @@ +--- +name: brooks-sweep +description: > + Full-sweep mode: runs a unified analysis across all quality dimensions — code decay, + architecture, tech debt, and test quality — then applies fixes directly to the + codebase. Safe changes are auto-applied; risky changes are confirmed before + execution. Drawing on twelve classic engineering books. + Triggers when: user wants to "fix everything", "sweep the codebase", "auto-fix all + issues", "run all checks and fix them", "clean up the whole project", or asks for + a single command that both diagnoses and remediates quality problems. + Do NOT trigger for: read-only audits or health reports where the user only wants + findings without code changes; single-dimension reviews (use the focused skill + instead: brooks-review / brooks-audit / brooks-debt / brooks-test); server health + checks, HTTP /health endpoints, Kubernetes probes, or application uptime. +--- + +# Brooks-Lint — Full Sweep & Auto-Fix + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/decay-risks.md` for production risk symptom definitions +4. Read `../_shared/test-decay-risks.md` for test risk symptom definitions +5. Read `sweep-guide.md` in this directory for the unified scan and fix process + +## Process + +**If the user has not specified a project or directory:** apply Auto Scope Detection +from `../_shared/common.md` to determine the review scope before proceeding. + +1. Show pre-flight consent notice and wait for the user's one-time approval (Step 0 of the guide) +2. Enumerate scope and initialize the `unresolvable` / `non_critical_rounds` / `fix_log` state (Step 1 of the guide) +3. Run the four dimensions in sequence — review, test, debt, audit — each scanning, classifying, applying Safe + Extended-Safe fixes, and verifying via the project test command (Steps 2–5 of the guide) +4. Iterate: re-scan modified files + same-module + static consumers; converge on a clean round, retire 3-retry failures to the `unresolvable` set, cap non-critical rounds at 3 (Step 6 of the guide) +5. Aggregate residual and unresolvable items and output the Full Sweep Report (Steps 7–8 of the guide) + +**Mode line in report:** `Full Sweep` diff --git a/skills/thirdparty/brooks-sweep/sweep-guide.md b/skills/thirdparty/brooks-sweep/sweep-guide.md new file mode 100644 index 00000000..73dbc36f --- /dev/null +++ b/skills/thirdparty/brooks-sweep/sweep-guide.md @@ -0,0 +1,264 @@ +# Brooks-Lint — Full Sweep Guide + +Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings +in place, iterates until clean or capped, reports residuals. One interaction point: +Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8. + +Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**. + +--- + +### Step 0 — Pre-flight consent gate + +**Goal:** State scope, cost, and irreversibility up front; get explicit consent +once so later steps never have to ask. + +0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or + `find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough. + +0b. Show this notice verbatim with the estimate filled in. Do not paraphrase — + the user is agreeing to this exact scope. + + ``` + ⚠️ /brooks-sweep — Full Repository Sweep & Auto-Fix + + Scope: Four analysis dimensions run in sequence — PR code decay (R1–R6), + test quality (T1–T6), tech debt, architecture. Edits are made in + place inside the detected project scope. + Estimated files in scope: ~N + + Order: brooks-review → brooks-test → brooks-debt → brooks-audit. + Each dimension scans, queues, and fixes before the next starts. + + Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file + fixes that have test coverage AND do not break a public interface + also apply directly. High-risk fixes (public API break, cross-module + structural change, or no test coverage) are NOT applied — they are + recorded in the residual report for human review. + + Iteration: After each dimension pass, modified files + same-module + static + consumers are re-scanned. A finding that fails to fix 3 times is + retired to the unresolvable set and never re-queued. Non-critical + rounds cap at 3 iterations; critical findings iterate until + resolved or retired. + + Git impact: The pipeline edits files. It does NOT commit, push, or amend. + If you have uncommitted work you want to preserve, commit or stash + first. + + Proceed with full autonomous sweep? [Y/n] + ``` + +0c. Parse the reply (first match wins, evaluate rules in order): + 1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified." + 2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1. + 3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1. + 4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification." + +0d. After consent, do not ask further questions until Step 8. + +--- + +### Step 1 — Scope enumeration and state init + +1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not + specify files or a directory. Otherwise honor the user's explicit scope. + +1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`, + `severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the + applied config values and reuse them across all iteration rounds — do not + re-read the file in Step 6 even if files were modified. + +1c. Initialize pipeline state (persists across all rounds): + + - **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued. + - **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round. + - **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`). + +1d. Record the final scope file list in the Fix Report output buffer for Step 8. + +--- + +### Step 2 — brooks-review pass (R1–R6 code decay) + +Scan every file in scope against all R-series risks defined in +`../_shared/decay-risks.md`. + +2a. For each R-risk, apply its symptom checklist. Record each hit as a finding + with: risk code, file + approximate line range, Symptom, Source, + Consequence, Remedy, Severity (Critical / Warning / Suggestion), and + **Fix-Class** (see Step 2b). + +2b. Assign Fix-Class per finding: + + | Class | Criteria | + |-------|----------| + | **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. | + | **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. | + | **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. | + +2c. Skip any finding that matches an entry in the `unresolvable` set. + +2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk + within each severity tier first. For each fix: Edit or Write, then append + one row to `fix_log` with outcome `applied`. If two fixes touch overlapping + line ranges in the same file, apply higher-severity first, re-read the file, + then apply the next. + +2e. After all fixes in this dimension, run the project test/lint command if one + exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.). + If tests fail: revert fixes from this dimension in reverse order one at a + time, re-running the test command after each revert, until tests pass. + Mark each reverted fix with outcome `reverted` in `fix_log` and promote the + finding to **Residual**. If no test command is found, note this once in the + report and continue. + +2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied, + R reverted, P Residual. + +--- + +### Step 3 — brooks-test pass (T1–T6 test decay) + +Scan test files (and untested production code) against T-series risks defined +in `../_shared/test-decay-risks.md`. + +Follow the same sub-steps as Step 2 (classify → apply → verify → summarize), +using T-prefix risk codes. For production files with no test coverage at all, +record as T2 (Missing Tests). A test scaffold that adds a pure-function test is +**Safe**; adding tests that require new test infrastructure is **Residual**. + +--- + +### Step 4 — brooks-debt pass (tech debt accumulation) + +Re-classify R-findings through a debt lens — same symptoms at accumulation scale: +repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead +flags. Score each with **Pain (1–3) × Spread (1–3)**; total 7–9 = Critical, +4–6 = Warning, 1–3 = Suggestion. Apply a severity bump for pattern-level +occurrences (isolated Suggestion → 4+ modules Warning). + +Follow the same sub-steps as Step 2. Debt findings often span multiple files +and are more likely to land in Extended-Safe or Residual than Safe. + +--- + +### Step 5 — brooks-audit pass (architecture integrity) + +Scan the full scope for architecture-level issues. The dependency-direction +symptoms (inverted dependencies, circular imports, cross-domain coupling) are +defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5 +additionally covers architecture-only concerns that R5 does not: missing +abstraction layers, god modules, leaked infrastructure inside domain code, +and seam-boundary violations. + +Most architecture findings are **Residual** by definition — they require human +judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared +constant used in 3+ modules into a new module that nothing else imports yet). +Do not auto-refactor module layouts, rename packages, or change public exports. + +Follow the same sub-steps as Step 2. + +--- + +### Step 6 — Iteration loop + +**Goal:** Re-scan what the fixes touched and converge. Stop on clean round, +cap, or no progress. + +6a. Build the re-scan scope: + - every file modified in Steps 2–5 of the current round, PLUS + - every file in the same module as a modified file, PLUS + - every file that statically imports from a modified file. + + Do not re-scan files whose dependencies were not touched. On monorepos + where a "module" may span hundreds of files, narrow the same-module bucket + to files that import from or are imported by a modified file (direct + dependency graph only). + +6b. Re-run Steps 2–5 on the re-scan scope. For each new finding in this round: + - If it matches an entry in `unresolvable` → skip. + - Else if 🔴 Critical → queue and fix; Critical findings iterate until + resolved OR retired (3 failed attempts → `unresolvable`). + - Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below. + +6c. Classify the round after all fixes attempted: + - **Clean round** (no new findings outside `unresolvable`): pipeline + converged → proceed to Step 7. + - **Critical-only round**: do NOT increment `non_critical_rounds`; return + to 6a. + - **Mixed or non-critical round** (any Warning / Suggestion produced): + increment `non_critical_rounds` by 1. If it reaches the cap (default 3, + or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7 + with remaining non-critical findings recorded as + `"Unresolved — iteration cap reached"`. Otherwise return to 6a. + +6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times + across any combination of rounds, retire it to `unresolvable` with reason + `"3-retry budget exhausted"` and stop attempting it. + +--- + +### Step 7 — Residual aggregation + +Collect everything that was NOT fixed in place, de-duplicated: + +- All Residual-class findings from Steps 2–5 (first round + re-scan rounds) +- All `unresolvable` entries with their retirement reason +- All iteration-cap residuals from Step 6c + +Sort Critical → Warning → Suggestion. Within each severity, list file path, +risk code, Symptom (one line), Remedy (one line), and the reason it was not +applied (`public API break` / `no test coverage` / `3-retry budget` / +`iteration cap`). + +--- + +### Step 8 — Sweep report + +Output the final report. Use the standard Report Template from +`../_shared/common.md` with these additions: + +``` +# Brooks-Lint — Full Sweep Report +Mode: Full Sweep | Scope: +Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored) # omit if no config + +## Dimension Summary +| Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual | +|-----------|---------|--------------|------------------|----------|----------| +| Review (R1–R6) | ... | ... | ... | ... | ... | +| Test (T1–T6) | ... | ... | ... | ... | ... | +| Debt | ... | ... | ... | ... | ... | +| Audit | ... | ... | ... | ... | ... | + +## Iteration History +Round 1: , new findings +Round 2: ... +Stopped at: clean round | iteration cap | no outstanding criticals + +## Fix Log +| # | File | Lines | Risk | Outcome | Change | +|---|------|-------|------|----------|--------| +| 1 | ... | ... | R2 | applied | Extract repeated constant | +| 2 | ... | ... | T4 | reverted | Test regression; promoted to Residual | +... + +## Health Score Delta +Before: /100 → After: /100 +(Re-run /brooks-health for an exact recalculation.) + +## Residual Items ( not applied) + + +## Summary +- Total findings detected: +- Fixed this sweep: +- Residual (needs human review): +- Unresolvable (3-retry exhausted): +``` + +If there are zero residual items and zero unresolvable entries, end with: +**"Sweep complete — codebase is clean."** + +**Mode line in report:** `Full Sweep` diff --git a/skills/thirdparty/brooks-test/SKILL.md b/skills/thirdparty/brooks-test/SKILL.md new file mode 100644 index 00000000..61a04795 --- /dev/null +++ b/skills/thirdparty/brooks-test/SKILL.md @@ -0,0 +1,36 @@ +--- +name: brooks-test +description: > + Test quality review drawing on twelve classic engineering books — with primary focus + on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and + Working Effectively with Legacy Code — that diagnoses structural problems in an + existing test suite: brittleness, mock abuse, coverage illusions, slow execution, + poor readability. + Triggers when: user asks about test quality, shares test files for review, or + expresses frustration: "tests keep breaking whenever I change anything", "our tests + take forever", "I can't understand what this test is doing", "tests pass but bugs + still reach production", "we have too many mocks". + Do NOT trigger for: writing new tests from scratch (use the regular test-writing + workflow) or testing framework/syntax questions — this skill reviews an existing + suite for structural quality problems, not individual test authoring. +--- + +# Brooks-Lint — Test Quality Review + +## Setup + +1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules +2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs +3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions +4. Read `test-guide.md` in this directory for the test quality review framework + +## Process + +**If the user has not shared test files or pointed to a test directory:** apply Auto +Scope Detection from `../_shared/common.md` to determine the review scope before proceeding. + +1. Build the test suite map (guide's "Before You Start" section) +2. Scan for each test decay risk in the order specified (Steps 1–4 of the guide) +3. Apply the Iron Law and output using the Report Template (Step 5 of the guide) + +**Mode line in report:** `Test Quality Review` diff --git a/skills/thirdparty/brooks-test/test-guide.md b/skills/thirdparty/brooks-test/test-guide.md new file mode 100644 index 00000000..09f12c9b --- /dev/null +++ b/skills/thirdparty/brooks-test/test-guide.md @@ -0,0 +1,147 @@ +# Test Quality Review Guide — Mode 4 + +**Purpose:** Diagnose the health of a test suite using six test-space decay risks. +Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy. + +--- + +## Before You Start: Build the Test Suite Map + +Before scanning for any risk, map the current test suite structure: + +``` +Unit tests: X files, ~N tests +Integration tests: X files, ~N tests +E2E tests: X files, ~N tests +Ratio: Unit X% : Integration X% : E2E X% +Coverage areas: [modules with tests] vs [modules without tests] +``` + +If you cannot access test files directly, ask the user **one question** — choose the +most relevant: +1. "Which module is hardest to test or has the least coverage?" +2. "When you make a change, how often do unrelated tests break?" +3. "Is there a part of the codebase your team avoids touching because it has no tests?" + +After one answer, proceed. Do not ask more than one question. + +--- + +## Analysis Process + +Work through these five steps in order. + +### Step 1: Scan for Test Obscurity + +*Scan this first — the most visible risk and the one that determines whether the suite +is maintainable at all.* + +Look for: +- Read 5–10 test names at random: can each one communicate subject + scenario + expected + outcome without opening the test body? +- Are there tests where a failure gives no clue which behavior broke (multiple assertions, + no message strings)? +- Does any test depend on external state (files, database rows, env variables, shared mutable + fixtures) that is invisible from within the test body? +- Is there a single massive setUp or beforeEach that every test inherits regardless of + what it actually needs? + +If all test names are clear and setups are minimal → no finding. + +### Step 2a: Scan for Test Brittleness + +*Brittle tests break on refactors that do not change observable behavior — they test +implementation, not contracts.* + +Look for: +- Ask (or check git history): did any recent refactor cause test failures with no + behavior change? +- Are there test methods where the name contains "and" or that assert on 3 or more + unrelated behaviors (Eager Test)? +- Do assertions specify mock call order or exact parameter values that are irrelevant + to the observable behavior? +- Are tests coupled to private methods or internal state directly? + +If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical. +If isolated (1–2 brittle tests) → 🟢 Suggestion. + +### Step 2b: Scan for Mock Abuse + +*Mock Abuse produces tests that pass regardless of whether the real behavior is correct. +Scan this separately from brittleness — over-mocking is often the cause of brittleness, +but it is a distinct problem worth its own finding.* + +**Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and +check brittleness signals and mock-setup ratio in the same pass, then write separate +findings if both problems are present. + +Look for: +- Is mock setup code longer than the assertion logic in the sampled tests? +- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than + assertions on outputs, state, or observable events? +- Are there methods in production classes that are only called from test files + (test-induced design damage)? +- Does any single test create more than 3 mock objects? + +If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning. +If production methods exist only for test access → 🔴 Critical (architecture is being +distorted by the test suite). + +### Step 3: Scan for Test Duplication + +Look for: +- Is the same setup block (same variables initialized the same way) repeated across + 5 or more test files without a shared helper? +- Are there multiple tests that pass identical inputs and assert identical outputs + with no differentiation (Lazy Test)? +- Is the same business scenario covered at unit, integration, and E2E level with no + difference in what each layer is testing? + +If duplication is systemic (10 or more instances) → Critical. +If localized (3–5 instances) → Warning. + +### Step 4: Scan for Coverage Illusion and Architecture Mismatch + +Look for Coverage Illusion: +- Pick the most recently modified core module. Are its error-handling branches and + null/boundary inputs covered by tests? +- Are there legacy areas (old functions, no test files nearby) that are actively + being changed? +- Do the tests assert on side effects (DB writes, events emitted, state transitions) + or only on return values? + +**Characterization Test check:** If legacy code is being modified without existing tests, +the team needs Characterization Tests before making the change — not after. Look for +this pattern and flag it when absent. + +A Characterization Test locks in current behavior (right or wrong) so future changes +do not silently regress it. Template: +``` +test("characterize: [module].[method] given [input], returns [current output]") { + // Call the code under test with realistic inputs + // Assert on whatever it currently returns — even if you suspect the output is wrong + // Add a comment: "This captures current behavior, not necessarily correct behavior" +} +``` +Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests + +Look for Architecture Mismatch: +- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E? +- Are high-risk modules tested at higher density than trivial utilities? + +**Test suite performance:** A slow test suite is a first-class maintainability risk — it +breaks the fast-feedback loop and causes developers to skip running tests locally. +- If the full suite runtime is known and > 10 minutes → 🟡 Warning +- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time + means nobody is running it regularly) +- If tests that could be unit tests are integration tests, that is a Performance Mismatch: + each misclassified test adds seconds of avoidable wait time + +Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253) + +### Step 5: Apply Iron Law, Output Report + +Apply the Iron Law format from `../_shared/common.md` to each finding. + +Use the standard Report Template. Mode: Test Quality Review. +Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map". diff --git a/skills/thirdparty/codebase-migrate/SKILL.md b/skills/thirdparty/codebase-migrate/SKILL.md new file mode 100644 index 00000000..d657c3b2 --- /dev/null +++ b/skills/thirdparty/codebase-migrate/SKILL.md @@ -0,0 +1,128 @@ +--- +name: codebase-migrate +description: Run large codebase migrations and multi-file refactors. Uses the Composio CLI to coordinate issue tracking, batched PRs, and CI verification while the agent executes the transforms locally across hundreds of files. +metadata: + short-description: Codebase migrations + multi-file refactors +--- + +# Codebase Migrate + +Coordinate framework upgrades, API renames, config rewrites, and structural refactors across hundreds of files. Local edits are driven by the agent; the [Composio CLI](https://docs.composio.dev/docs/cli) handles the surrounding ceremony: tracking issues, per-batch PRs, and CI verification. + +## When to Use + +- Framework upgrade (React 17 → 19, Node 18 → 22, Django 4 → 5). +- API rename across a monorepo (e.g., `getUserById` → `users.byId`). +- Config/format migration (webpack → vite, eslint → biome, jest → vitest). +- Any "change 200 files the same way" task that needs to ship in reviewable slices. + +## Prereqs + +```bash +curl -fsSL https://composio.dev/install | bash +composio login +composio link github # for PRs + CI status +composio link linear # or jira — for migration tracking +``` + +Local tools the agent will use directly: `git`, `rg`, `jscodeshift`/`ts-morph`/`comby`/`ast-grep` (language-appropriate), and your test runner. + +## Planning Phase + +1. **Define the transform precisely.** Bad: "migrate to vitest." Good: "replace `jest.mock` with `vi.mock`, swap `jest.fn()` for `vi.fn()`, rename `jest.config.js` → `vitest.config.ts` using template X." +2. **Scope the blast radius:** + ```bash + rg -l 'jest\.(mock|fn|spyOn)' | wc -l + rg -l 'from "jest"' | sort + ``` +3. **File a tracking issue:** + ```bash + composio execute LINEAR_CREATE_ISSUE -d '{ + "teamId":"TEAM_ID", + "title":"Migrate test runner: jest → vitest", + "description":"Batches of ~25 files. Checkpoint after each PR lands green." + }' + ``` + +## Execute in Reviewable Batches + +Loop: pick N files → transform → test → PR → wait for green → merge → next batch. + +```bash +# Batch helper: first 25 untouched files matching the pattern +BATCH=$(rg -l 'jest\.mock' | grep -v done.list | head -25) +echo "$BATCH" > batch.list +``` + +The agent runs the codemod on `batch.list`, then: + +```bash +git checkout -b migrate/vitest-batch-03 +xargs < batch.list codemod-runner # e.g. jscodeshift / ts-morph / comby +npm test -- --changed +git add -A && git commit -m "migrate(test): jest → vitest (batch 3)" +git push -u origin migrate/vitest-batch-03 + +composio execute GITHUB_CREATE_A_PULL_REQUEST -d '{ + "owner":"acme","repo":"app", + "head":"migrate/vitest-batch-03","base":"main", + "title":"migrate(test): jest → vitest (batch 3)", + "body":"Part of LIN-482. 25 files. Codemod: `transforms/jest-to-vitest.ts`." +}' +``` + +Then poll CI and merge when green: + +```bash +composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \ + -d '{"owner":"acme","repo":"app","branch":"migrate/vitest-batch-03"}' +``` + +## Workflow Script + +`scripts/migrate-batch.ts`, run per batch via `composio run --file scripts/migrate-batch.ts -- --batch 3`: + +```ts +const batch = process.argv[process.argv.indexOf("--batch") + 1]; + +const pr = await execute("GITHUB_CREATE_A_PULL_REQUEST", { + owner: "acme", repo: "app", + head: `migrate/vitest-batch-${batch}`, base: "main", + title: `migrate(test): jest → vitest (batch ${batch})`, + body: `Part of LIN-482. See transforms/jest-to-vitest.ts.` +}); + +await execute("LINEAR_CREATE_COMMENT", { + issueId: "LIN-482", + body: `Opened PR #${pr.number}: ${pr.html_url}` +}); +``` + +## Safety Rails + +- **One transform per PR.** Never mix a rename with a format change. +- **Keep a `done.list`** of files already migrated so the next batch skips them. +- **Run the full test suite on the last batch**, even if per-batch PRs ran `--changed`. +- **Codemod first, hand-edit second.** If the codemod misses 3 files, patch them manually and note it in the PR body. +- **Roll back per-batch**, not globally. Each PR should revert cleanly. + +## Verification Loop + +After each merge: + +```bash +rg 'jest\.(mock|fn|spyOn)' | wc -l # should trend to 0 +npm test # full suite +composio execute GITHUB_LIST_WORKFLOW_RUNS_FOR_A_REPOSITORY \ + -d '{"owner":"acme","repo":"app","branch":"main","event":"push"}' \ + | jq '.workflow_runs[0].conclusion' +``` + +## Troubleshooting + +- **Codemod regex catches too much** → switch to AST-based tooling (`ast-grep`, `ts-morph`) for structural matches. +- **Tests pass locally, CI fails** → pin Node/Python version parity; check `.nvmrc` / `pyproject.toml`. +- **PR too big to review** → cut batch size in half; maintainers won't review 800-line diffs. +- **Conflicts between batches** → rebase the open batch before merging the current one; never force-push merged batches. + +Full CLI reference: [docs.composio.dev/docs/cli](https://docs.composio.dev/docs/cli) diff --git a/skills/thirdparty/codebase-recon/SKILL.md b/skills/thirdparty/codebase-recon/SKILL.md new file mode 100644 index 00000000..2206a1fd --- /dev/null +++ b/skills/thirdparty/codebase-recon/SKILL.md @@ -0,0 +1,266 @@ +--- +name: codebase-recon +description: This skill should be used when analyzing codebases, understanding architecture, or when "analyze", "investigate", "explore code", or "understand architecture" are mentioned. +metadata: + version: "1.0.0" +--- + +# Codebase Analysis + +Evidence-based investigation → findings → confidence-tracked conclusions. + +## Steps + +1. Gather evidence from multiple sources (code, docs, tests, history) +2. Track confidence level as investigation progresses +3. Based on findings: + - If pattern analysis needed → load the `outfitter:patterns` skill + - If root cause investigation → load the `outfitter:find-root-causes` skill + - If ready to report → load the `outfitter:report-findings` skill +4. Deliver findings with confidence level and caveats + + + +- Codebase exploration and understanding +- Architecture analysis and mapping +- Pattern extraction and recognition +- Technical research within code +- Performance or security analysis + +NOT for: wild guessing, assumptions without evidence, conclusions before investigation + + + + + +| Bar | Lvl | Name | Action | +|-----|-----|------|--------| +| `░░░░░` | 0 | Gathering | Collect initial evidence | +| `▓░░░░` | 1 | Surveying | Broad scan, surface patterns | +| `▓▓░░░` | 2 | Investigating | Deep dive, verify patterns | +| `▓▓▓░░` | 3 | Analyzing | Cross-reference, fill gaps | +| `▓▓▓▓░` | 4 | Synthesizing | Connect findings, high confidence | +| `▓▓▓▓▓` | 5 | Concluded | Deliver findings | + +*Calibration: 0=0–19%, 1=20–39%, 2=40–59%, 3=60–74%, 4=75–89%, 5=90–100%* + +Start honest. Clear codebase + focused question → level 2–3. Vague or complex → level 0–1. + +At level 4: "High confidence in findings. One more angle would reach full certainty. Continue or deliver now?" + +Below level 5: include `△ Caveats` section. + + + + + +## Core Methodology + +**Evidence over assumption** — investigate when you can, guess only when you must. + +**Multi-source gathering** — code, docs, tests, history, web research, runtime behavior. + +**Multiple angles** — examine from different perspectives before concluding. + +**Document gaps** — flag uncertainty with △, track what's unknown. + +**Show your work** — findings include supporting evidence, not just conclusions. + +**Calibrate confidence** — distinguish fact from inference from assumption. + + + + + +## Source Priority + +1. **Direct observation** — read code, run searches, examine files +2. **Documentation** — official docs, inline comments, ADRs +3. **Tests** — reveal intended behavior and edge cases +4. **History** — git log, commit messages, PR discussions +5. **External research** — library docs, Stack Overflow, RFCs +6. **Inference** — logical deduction from available evidence +7. **Assumption** — clearly flagged when other sources unavailable + +## Investigation Patterns + +**Start broad, then narrow:** +- File tree → identify relevant areas +- Search patterns → locate specific code +- Code structure → understand without full content +- Read targeted files → examine implementation +- Cross-reference → verify understanding + +**Layer evidence:** +- What does the code do? (direct observation) +- Why was it written this way? (history, comments) +- How does it fit the system? (architecture, dependencies) +- What are the edge cases? (tests, error handling) + +**Follow the trail:** +- Function calls → trace execution paths +- Imports/exports → map dependencies +- Test files → understand usage patterns +- Error messages → reveal assumptions +- Comments → capture historical context + + + + + +## During Investigation + +After each evidence-gathering step emit: + +- **Confidence:** {BAR} {NAME} +- **Found:** { key discoveries } +- **Patterns:** { emerging themes } +- **Gaps:** { what's still unclear } +- **Next:** { investigation direction } + +## At Delivery (Level 5) + +### Findings + +{ numbered list of discoveries with supporting evidence } + +1. {FINDING} — evidence: {SOURCE} +2. {FINDING} — evidence: {SOURCE} + +### Patterns + +{ recurring themes or structures identified } + +### Implications + +{ what findings mean for the question at hand } + +### Confidence Assessment + +Overall: {BAR} {PERCENTAGE}% + +High confidence areas: +- {AREA} — {REASON} + +Lower confidence areas: +- {AREA} — {REASON} + +### Supporting Evidence + +- Code: { file paths and line ranges } +- Docs: { references } +- Tests: { relevant test files } +- History: { commit SHAs if relevant } +- External: { URLs if applicable } + +## Below Level 5 + +### △ Caveats + +**Assumptions:** +- {ASSUMPTION} — { why necessary, impact if wrong } + +**Gaps:** +- {GAP} — { what's missing, how to fill } + +**Unknowns:** +- {UNKNOWN} — { noted for future investigation } + + + + + +Load skills for specialized analysis (see Steps section): + +- **Pattern analysis** → `outfitter:patterns` +- **Root cause investigation** → `outfitter:find-root-causes` +- **Research synthesis** → `outfitter:report-findings` +- **Architecture analysis** → see [architecture-analysis.md](references/architecture-analysis.md) + + + + + +Loop: Gather → Analyze → Update Confidence → Next step + +1. **Calibrate starting confidence** — what do we already know? +2. **Identify evidence sources** — where can we look? +3. **Gather systematically** — collect from multiple angles +4. **Cross-reference findings** — verify patterns hold +5. **Flag uncertainties** — mark gaps with △ +6. **Synthesize conclusions** — connect evidence to insights +7. **Deliver with confidence level** — clear about certainty + +At each step: +- Document what you found (evidence) +- Note what it means (interpretation) +- Track what's still unclear (gaps) +- Update confidence bar + + + + + +Before concluding (level 4+): + +**Check evidence quality:** +- ✓ Multiple sources confirm pattern? +- ✓ Direct observation vs inference clearly marked? +- ✓ Assumptions explicitly flagged? +- ✓ Counter-examples considered? + +**Check completeness:** +- ✓ Original question fully addressed? +- ✓ Edge cases explored? +- ✓ Alternative explanations ruled out? +- ✓ Known unknowns documented? + +**Check deliverable:** +- ✓ Findings supported by evidence? +- ✓ Confidence calibrated honestly? +- ✓ Caveats section included if <100%? +- ✓ Next steps clear if incomplete? + + + + + +ALWAYS: +- Investigate before concluding +- Cite evidence sources with file paths/URLs +- Use confidence bars to track certainty +- Flag assumptions and gaps with △ +- Cross-reference from multiple angles +- Document investigation trail +- Distinguish fact from inference +- Include caveats below level 5 + +NEVER: +- Guess when you can investigate +- State assumptions as facts +- Conclude from single source +- Hide uncertainty or gaps +- Skip validation checks +- Deliver without confidence assessment +- Conflate evidence with interpretation + + + + + +Core methodology: +- [confidence.md](../pathfinding/references/confidence.md) — confidence calibration (shared with pathfinding) + +Micro-skills (load as needed): +- `outfitter:patterns` — extracting and validating patterns +- `outfitter:find-root-causes` — systematic problem diagnosis +- `outfitter:report-findings` — multi-source research synthesis + +Local references: +- [architecture-analysis.md](references/architecture-analysis.md) — system structure mapping + +Related skills: +- `outfitter:pathfinding` — clarifying requirements before analysis +- `outfitter:debugging` — structured bug investigation + + diff --git a/skills/thirdparty/codebase-recon/references/architecture-analysis.md b/skills/thirdparty/codebase-recon/references/architecture-analysis.md new file mode 100644 index 00000000..7626f7f5 --- /dev/null +++ b/skills/thirdparty/codebase-recon/references/architecture-analysis.md @@ -0,0 +1,220 @@ +# Architecture Analysis + +Techniques for analyzing system structure, dependencies, and component relationships. + +## Dependency Mapping + +### Forward Dependencies + +What a component relies on: +1. **Direct imports** — explicit dependencies in code +2. **Indirect references** — called through interfaces +3. **Runtime dependencies** — configuration, environment +4. **Data dependencies** — shared state, databases + +### Reverse Dependencies + +What relies on this component: +1. **Direct dependents** — explicit imports from other modules +2. **Interface consumers** — components using this API +3. **Side effect consumers** — code relying on mutations +4. **Event subscribers** — listeners for this component's events + +### Circular Dependencies + +Red flags: +- A imports B, B imports A +- Longer cycles: A → B → C → A +- Implicit cycles through shared state + +Resolution strategies: +- Extract shared code to separate module +- Introduce interface/abstraction layer +- Invert dependency direction +- Break into smaller components + +## Layer Identification + +### Detecting Layers + +Look for: +- **Directional flow** — data/control flows one way +- **Abstraction levels** — concrete → abstract as you ascend +- **Responsibility clustering** — similar concerns grouped +- **Interface boundaries** — clear contracts between groups + +### Common Layer Patterns + +**Three-tier**: +- Presentation (UI, API endpoints) +- Business logic (domain, workflows) +- Data access (repositories, queries) + +**Hexagonal/Clean**: +- Core domain (entities, business rules) +- Application layer (use cases, orchestration) +- Infrastructure (frameworks, external services) +- Interfaces (controllers, adapters) + +**Microservices**: +- Service boundary (API gateway) +- Service logic (domain per service) +- Data layer (per-service database) +- Cross-cutting (auth, logging, monitoring) + +### Layer Violations + +Violations indicate architectural drift: +- Lower layer imports higher layer +- Business logic in presentation layer +- Data access code in domain entities +- Infrastructure concerns leaking into core + +## Interface Analysis + +### Contract Definition + +Examine: +- **Input types** — what does it accept? +- **Output types** — what does it return? +- **Error modes** — what can fail, how? +- **Side effects** — mutations, I/O, state changes +- **Invariants** — what must always be true? + +### API Quality + +Strong interfaces show: +- **Cohesion** — methods belong together +- **Minimal surface** — small, focused API +- **Clear contracts** — types tell the story +- **Stability** — changes don't cascade +- **Composability** — works well with others + +Weak interfaces show: +- **Kitchen sink** — unrelated methods bundled +- **Leaky abstractions** — implementation details exposed +- **Unstable** — frequent breaking changes +- **Rigid** — hard to extend or compose + +## Component Relationships + +### Relationship Types + +**Composition**: +- Component owns sub-components +- Lifecycles coupled +- Strong cohesion +- Example: `Page` owns `Header`, `Footer` + +**Aggregation**: +- Component references others +- Independent lifecycles +- Loose coupling +- Example: `ShoppingCart` references `Product` + +**Dependency**: +- Uses another component's interface +- No ownership +- Can be swapped +- Example: `AuthService` uses `Database` + +**Association**: +- Knows about but doesn't own +- Weak relationship +- Often bidirectional +- Example: `User` ↔ `Post` (many-to-many) + +### Coupling Analysis + +**Low coupling** (good): +- Communicate through interfaces +- Few shared assumptions +- Changes localized +- Easy to test in isolation + +**High coupling** (risky): +- Direct field access +- Shared mutable state +- Knowledge of implementation +- Changes ripple widely + +## Architectural Pattern Recognition + +### Layered Architecture + +Indicators: +- Unidirectional dependencies (top → bottom) +- Each layer uses only layer below +- Clear separation of concerns + +Trade-offs: +- ✓ Simple, well-understood +- ✓ Easy to enforce rules +- ✗ Can become rigid +- ✗ Performance overhead + +### Event-Driven Architecture + +Indicators: +- Pub/sub or message queues +- Decoupled components +- Asynchronous communication +- Event sourcing patterns + +Trade-offs: +- ✓ Scalable, resilient +- ✓ Loose coupling +- ✗ Harder to reason about flow +- ✗ Eventual consistency challenges + +### Microservices + +Indicators: +- Service per bounded context +- Independent deployment +- API-based communication +- Decentralized data + +Trade-offs: +- ✓ Independent scaling +- ✓ Technology diversity +- ✗ Distributed system complexity +- ✗ Operational overhead + +## Analysis Workflow + +### Top-Down + +Start broad, narrow focus: +1. **System boundaries** — what's in scope? +2. **Major components** — high-level modules +3. **Component interactions** — how they communicate +4. **Internal structure** — zoom into each component +5. **Implementation** — code-level details + +### Bottom-Up + +Start specific, build understanding: +1. **Entry point** — main(), server start, UI root +2. **Call graph** — trace execution paths +3. **Cluster calls** — group related functionality +4. **Extract components** — identify logical boundaries +5. **Map relationships** — connect the pieces + +### Targeted + +Focus on specific concern: +1. **Define question** — what are you trying to understand? +2. **Identify relevant code** — where does this happen? +3. **Trace dependencies** — what does it touch? +4. **Analyze impact** — what would changing this affect? +5. **Document findings** — capture insights + +## Documentation Extraction + +From architecture analysis, capture: +- **Component diagram** — boxes and arrows +- **Dependency graph** — what imports what +- **Layer diagram** — abstraction levels +- **Sequence diagrams** — interaction flows +- **Decision records** — why this structure?