# Audit Templates Templates for auditing and improving existing harness infrastructure. Advanced profile note: eval and observability sections in this reference apply only when the project explicitly enables advanced agent-platform capabilities. Core ECL harness audits should not fail or lose score just because `harness/eval`, `harness/trace`, `harness/memory`, `harness/checkpoints`, or `harness/metrics` are absent. ## Audit Checklist ### Documentation Audit (25%) | Item | Check | Score | |------|-------|-------| | AGENTS.md exists | `test -f AGENTS.md` | 0/10 | | AGENTS.md is ~100 lines (not monolithic) | `wc -l AGENTS.md` should be 80-120 | 0/10 | | docs/ARCHITECTURE.md exists | `test -f docs/ARCHITECTURE.md` | 0/10 | | Architecture matches reality | Compare layer hierarchy to `go list ./...` | 0/20 | | docs/DEVELOPMENT.md exists | `test -f docs/DEVELOPMENT.md` | 0/10 | | Build commands in DEVELOPMENT.md work | Run them and check | 0/10 | | docs/QUALITY.md exists | `test -f docs/QUALITY.md` | 0/10 | | Design docs cover major components | Check docs/design-docs/ | 0/10 | | Reference docs are complete | Check docs/references/ | 0/10 | **Total: /100 → Scale to 25%** ### Linter Audit (20%) | Item | Check | Score | |------|-------|-------| | scripts/lint-deps.go exists | `test -f scripts/lint-deps.go` | 0/15 | | Layer map covers all packages | Compare to `go list ./...` | 0/20 | | Introducing violation fails lint | Add bad import, run lint | 0/15 | | scripts/lint-quality.go exists | `test -f scripts/lint-quality.go` | 0/15 | | Quality rules match QUALITY.md | Compare documented rules to linter | 0/10 | | Makefile has lint-arch target | `grep lint-arch Makefile` | 0/10 | | `make lint-arch` passes | Run it | 0/15 | **Total: /100 → Scale to 20%** ### Observability Audit (15%) | Item | Check | Score | |------|-------|-------| | harness/trace/ exists | `test -d harness/trace` | 0/25 | | Trace format covers all tool types | Check ToolTrace struct | 0/25 | | harness/selftest/ exists | `test -d harness/selftest` | 0/25 | | Observability hook registered | Check hook wiring | 0/25 | **Total: /100 → Scale to 15%** ### Eval Audit (20%) | Item | Check | Score | |------|-------|-------| | harness/eval/framework.go exists | `test -f harness/eval/framework.go` | 0/10 | | harness/eval/runner.go exists | `test -f harness/eval/runner.go` | 0/10 | | harness/eval/scorer.go exists | `test -f harness/eval/scorer.go` | 0/10 | | harness/eval/reporter.go exists | `test -f harness/eval/reporter.go` | 0/10 | | file_ops/ has 5+ tasks | Count JSON files | 0/10 | | code_gen/ has 5+ tasks | Count JSON files | 0/10 | | debugging/ has 5+ tasks | Count JSON files | 0/10 | | refactoring/ has 5+ tasks | Count JSON files | 0/10 | | Tasks cover new features | Manual review | 0/10 | | All tasks still work | Run evals | 0/10 | **Total: /100 → Scale to 20%** ### Quality Automation Audit (10%) | Item | Check | Score | |------|-------|-------| | harness/quality/score.go exists | `test -f harness/quality/score.go` | 0/25 | | Quality score calculation works | Run it | 0/25 | | harness/cleanup/tasks.go exists | `test -f harness/cleanup/tasks.go` | 0/25 | | Cleanup tasks find real issues | Run dry-run | 0/25 | **Total: /100 → Scale to 10%** ### Integration Audit (10%) | Item | Check | Score | |------|-------|-------| | `go build ./...` passes | Run it | 0/40 | | `make lint-arch` passes | Run it | 0/30 | | CI runs harness checks | Check CI config | 0/30 | **Total: /100 → Scale to 10%** --- ## Scoring Rubric ### How to Score Each Item - **Binary items** (exists/doesn't): 0 or full points - **Quality items** (matches reality): Partial credit based on accuracy - 100%: Exact match - 75%: Minor discrepancies (1-2 items) - 50%: Moderate discrepancies (3-5 items) - 25%: Major discrepancies but structure is right - 0%: Completely wrong or missing ### Calculating Overall Score ``` Overall = (Doc × 0.25) + (Linter × 0.20) + (Obs × 0.15) + (Eval × 0.20) + (Quality × 0.10) + (Integration × 0.10) ``` ### Score Interpretation | Score | Status | Action | |-------|--------|--------| | 0-20% | Critical | Use Create Mode — build from scratch | | 21-40% | Poor | Major gaps — extensive improvement needed | | 41-60% | Fair | Multiple gaps — targeted improvement | | 61-80% | Good | Minor gaps — polish and expand | | 81-100% | Excellent | Maintenance mode — keep current | --- ## Gap Analysis Templates ### Documentation Drift Report ```markdown ## Documentation Drift Analysis ### ARCHITECTURE.md Layer Hierarchy **Documented Layers:** ``` [Copy from ARCHITECTURE.md] ``` **Actual Package Structure:** ```bash go list ./... | grep -v vendor ``` **Discrepancies:** | Documented | Actual | Issue | |------------|--------|-------| | core/types | core/types | ✓ Match | | core/agent | core/agent | ✓ Match | | - | core/newpkg | Missing from docs | ### Tool Catalog **Documented Tools:** [count] **Actual Tools:** [count] **Missing from docs:** - ToolA (added in commit abc123) - ToolB (added in commit def456) ### Error Codes **Documented Codes:** [count] **Actual Codes:** [count] **Missing from docs:** - 300105 NotFoundError (added in PR #123) ``` ### Linter Gap Report ```markdown ## Linter Gap Analysis ### Layer Map Coverage **Packages in layer map:** [count] **Packages in codebase:** [count] **Missing from layer map:** | Package | Suggested Layer | Reason | |---------|-----------------|--------| | core/newpkg | Layer 2 | Depends only on core/types | | api/v2 | Layer 4 | New API version | ### Violation Test Results | Test | Expected | Actual | Status | |------|----------|--------|--------| | Bad import in core/types | Fail | Fail | ✓ Pass | | Bad import in core/agent | Fail | Fail | ✓ Pass | | Bad import in api/v2 | Fail | Pass | ✗ Gap | ### Quality Rules Coverage **Rules in QUALITY.md:** [count] **Rules in lint-quality.go:** [count] **Missing enforcement:** - Rule 5: "No hardcoded timeouts" — not checked by linter ``` ### Eval Coverage Report ```markdown ## Eval Coverage Analysis ### Tasks per Category | Category | Count | Target | Status | |----------|-------|--------|--------| | file_ops | 3 | 5+ | ✗ Below target | | code_gen | 2 | 5+ | ✗ Below target | | debugging | 5 | 5+ | ✓ Meets target | | refactoring | 4 | 5+ | ✗ Below target | ### Feature Coverage | Feature | Has Eval | Priority | |---------|----------|----------| | File write | ✓ | - | | File read | ✓ | - | | JSON parsing | ✗ | P1 | | Error handling | ✓ | - | | New auth module | ✗ | P0 | ### Task Health | Task ID | Status | Issue | |---------|--------|-------| | file_ops_001 | ✓ Works | - | | code_gen_001 | ✗ Broken | Uses removed API | | debug_001 | ✓ Works | - | ``` --- ## Improvement Plan Template ```markdown ## Harness Improvement Plan **Project:** [Name] **Audit Date:** YYYY-MM-DD **Audit Score:** XX% **Target Score:** 80%+ ### Priority Gaps #### P0 — Critical (Fix Immediately) 1. [Gap description] - Impact: [Why this matters] - Fix: [Specific action] - Effort: [Hours estimate] #### P1 — High (Fix This Sprint) 1. [Gap description] - Impact: [Why this matters] - Fix: [Specific action] - Effort: [Hours estimate] #### P2 — Medium (Fix Next Sprint) 1. [Gap description] - Impact: [Why this matters] - Fix: [Specific action] - Effort: [Hours estimate] #### P3 — Low (Backlog) 1. [Gap description] - Impact: [Why this matters] - Fix: [Specific action] - Effort: [Hours estimate] ### Improvement Timeline | Week | Focus | Expected Score | |------|-------|----------------| | 1 | P0 gaps | 45% → 55% | | 2 | P1 gaps | 55% → 70% | | 3 | P2 gaps | 70% → 80% | | 4 | P3 gaps + polish | 80% → 85% | ### Success Metrics - [ ] Audit score ≥ 80% - [ ] No P0 or P1 gaps remaining - [ ] `make lint-arch` passes - [ ] All eval categories have 5+ tasks - [ ] Quality score trend is positive ``` --- ## Before/After Comparison Template ```markdown ## Improvement Results **Project:** [Name] **Improvement Period:** YYYY-MM-DD to YYYY-MM-DD ### Score Comparison | Dimension | Before | After | Delta | |-----------|--------|-------|-------| | Documentation | XX% | XX% | +XX% | | Linters | XX% | XX% | +XX% | | Observability | XX% | XX% | +XX% | | Evals | XX% | XX% | +XX% | | Quality | XX% | XX% | +XX% | | Integration | XX% | XX% | +XX% | | **Overall** | **XX%** | **XX%** | **+XX%** | ### Changes Made #### Documentation - Updated ARCHITECTURE.md with [changes] - Created design doc for [component] - Added [N] entries to tool catalog #### Linters - Added [N] packages to layer map - Created new linter for [pattern] - Fixed [N] false positives #### Evals - Added [N] new eval tasks - Removed [N] obsolete tasks - Updated [N] broken tasks #### Quality - Added cleanup task for [pattern] - Updated quality score weights - Fixed [N] golden principle violations ### Remaining Gaps [List any P2/P3 items not yet addressed] ### Recommendations [Next steps for maintaining/improving harness] ``` --- ## Automated Audit Script ```go // scripts/audit-harness.go // // Automated harness audit. Run: go run scripts/audit-harness.go // // Outputs JSON with scores per dimension. package main import ( "encoding/json" "fmt" "os" "path/filepath" ) type AuditResult struct { Dimension string `json:"dimension"` Score float64 `json:"score"` MaxScore float64 `json:"max_score"` Percent float64 `json:"percent"` Items []AuditItem `json:"items"` } type AuditItem struct { Name string `json:"name"` Score float64 `json:"score"` Max float64 `json:"max"` Notes string `json:"notes,omitempty"` } func main() { results := []AuditResult{ auditDocumentation(), auditLinters(), auditObservability(), auditEvals(), auditQuality(), auditIntegration(), } // Calculate overall weights := map[string]float64{ "Documentation": 0.25, "Linters": 0.20, "Observability": 0.15, "Evals": 0.20, "Quality": 0.10, "Integration": 0.10, } var overall float64 for _, r := range results { overall += r.Percent * weights[r.Dimension] } // Output output := map[string]interface{}{ "results": results, "overall": overall, } data, _ := json.MarshalIndent(output, "", " ") fmt.Println(string(data)) } func auditDocumentation() AuditResult { r := AuditResult{Dimension: "Documentation", MaxScore: 100} // Check files exist files := map[string]float64{ "AGENTS.md": 10, "docs/ARCHITECTURE.md": 10, "docs/DEVELOPMENT.md": 10, "docs/QUALITY.md": 10, } for file, points := range files { if _, err := os.Stat(file); err == nil { r.Score += points r.Items = append(r.Items, AuditItem{Name: file, Score: points, Max: points}) } else { r.Items = append(r.Items, AuditItem{Name: file, Score: 0, Max: points, Notes: "missing"}) } } // Check docs/design-docs/ has files (not just the index) if matches, _ := filepath.Glob("docs/design-docs/*.md"); len(matches) > 0 { // Exclude index.md from count actualDocs := 0 for _, m := range matches { if !strings.HasSuffix(m, "index.md") { actualDocs++ } } score := min(float64(actualDocs)*5, 20) r.Score += score r.Items = append(r.Items, AuditItem{Name: "docs/design-docs/", Score: score, Max: 20, Notes: fmt.Sprintf("%d design docs (excluding index)", actualDocs)}) } else { r.Items = append(r.Items, AuditItem{Name: "docs/design-docs/", Score: 0, Max: 20, Notes: "empty or missing"}) } // Check docs/references/ has files if matches, _ := filepath.Glob("docs/references/*.md"); len(matches) > 0 { score := min(float64(len(matches))*5, 20) r.Score += score r.Items = append(r.Items, AuditItem{Name: "docs/references/", Score: score, Max: 20, Notes: fmt.Sprintf("%d files", len(matches))}) } else { r.Items = append(r.Items, AuditItem{Name: "docs/references/", Score: 0, Max: 20, Notes: "empty or missing"}) } // Remaining 20 points for AGENTS.md line count if data, err := os.ReadFile("AGENTS.md"); err == nil { lines := len(strings.Split(string(data), "\n")) if lines >= 80 && lines <= 150 { r.Score += 20 r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 20, Max: 20, Notes: fmt.Sprintf("%d lines", lines)}) } else if lines < 80 { r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 10, Max: 20, Notes: fmt.Sprintf("%d lines (too short)", lines)}) r.Score += 10 } else { r.Items = append(r.Items, AuditItem{Name: "AGENTS.md size", Score: 5, Max: 20, Notes: fmt.Sprintf("%d lines (too long, should be map)", lines)}) r.Score += 5 } } r.Percent = (r.Score / r.MaxScore) * 100 return r } func auditLinters() AuditResult { r := AuditResult{Dimension: "Linters", MaxScore: 100} linters := []string{"scripts/lint-deps.go", "scripts/lint-quality.go"} for _, l := range linters { if _, err := os.Stat(l); err == nil { r.Score += 25 r.Items = append(r.Items, AuditItem{Name: l, Score: 25, Max: 25}) } else { r.Items = append(r.Items, AuditItem{Name: l, Score: 0, Max: 25, Notes: "missing"}) } } // Check Makefile if data, err := os.ReadFile("Makefile"); err == nil { if strings.Contains(string(data), "lint-arch") { r.Score += 25 r.Items = append(r.Items, AuditItem{Name: "Makefile lint-arch", Score: 25, Max: 25}) } else { r.Items = append(r.Items, AuditItem{Name: "Makefile lint-arch", Score: 0, Max: 25, Notes: "target missing"}) } } // Remaining 25 for additional linters if matches, _ := filepath.Glob("scripts/lint-*.go"); len(matches) > 2 { r.Score += 25 r.Items = append(r.Items, AuditItem{Name: "additional linters", Score: 25, Max: 25, Notes: fmt.Sprintf("%d total", len(matches))}) } else { r.Items = append(r.Items, AuditItem{Name: "additional linters", Score: 0, Max: 25, Notes: "only core linters"}) } r.Percent = (r.Score / r.MaxScore) * 100 return r } func auditObservability() AuditResult { r := AuditResult{Dimension: "Observability", MaxScore: 100} dirs := map[string]float64{ "harness/trace": 35, "harness/selftest": 35, } for dir, points := range dirs { if info, err := os.Stat(dir); err == nil && info.IsDir() { r.Score += points r.Items = append(r.Items, AuditItem{Name: dir, Score: points, Max: points}) } else { r.Items = append(r.Items, AuditItem{Name: dir, Score: 0, Max: points, Notes: "missing"}) } } // Check for observability hook if matches, _ := filepath.Glob("**/observability*.go"); len(matches) > 0 { r.Score += 30 r.Items = append(r.Items, AuditItem{Name: "observability hook", Score: 30, Max: 30}) } else { r.Items = append(r.Items, AuditItem{Name: "observability hook", Score: 0, Max: 30, Notes: "not found"}) } r.Percent = (r.Score / r.MaxScore) * 100 return r } func auditEvals() AuditResult { r := AuditResult{Dimension: "Evals", MaxScore: 100} // Framework files (40 points) files := []string{ "harness/eval/framework.go", "harness/eval/runner.go", "harness/eval/scorer.go", "harness/eval/reporter.go", } for _, f := range files { if _, err := os.Stat(f); err == nil { r.Score += 10 r.Items = append(r.Items, AuditItem{Name: f, Score: 10, Max: 10}) } else { r.Items = append(r.Items, AuditItem{Name: f, Score: 0, Max: 10, Notes: "missing"}) } } // Dataset categories (60 points, 15 each) categories := []string{"file_ops", "code_gen", "debugging", "refactoring"} for _, cat := range categories { pattern := fmt.Sprintf("harness/eval/datasets/%s/*.json", cat) matches, _ := filepath.Glob(pattern) if len(matches) >= 5 { r.Score += 15 r.Items = append(r.Items, AuditItem{Name: cat, Score: 15, Max: 15, Notes: fmt.Sprintf("%d tasks", len(matches))}) } else if len(matches) > 0 { score := float64(len(matches)) * 3 r.Score += score r.Items = append(r.Items, AuditItem{Name: cat, Score: score, Max: 15, Notes: fmt.Sprintf("%d tasks (need 5+)", len(matches))}) } else { r.Items = append(r.Items, AuditItem{Name: cat, Score: 0, Max: 15, Notes: "no tasks"}) } } r.Percent = (r.Score / r.MaxScore) * 100 return r } func auditQuality() AuditResult { r := AuditResult{Dimension: "Quality", MaxScore: 100} items := map[string]float64{ "harness/quality/score.go": 35, "harness/cleanup/tasks.go": 35, "docs/QUALITY.md": 30, } for item, points := range items { if _, err := os.Stat(item); err == nil { r.Score += points r.Items = append(r.Items, AuditItem{Name: item, Score: points, Max: points}) } else { r.Items = append(r.Items, AuditItem{Name: item, Score: 0, Max: points, Notes: "missing"}) } } r.Percent = (r.Score / r.MaxScore) * 100 return r } func auditIntegration() AuditResult { r := AuditResult{Dimension: "Integration", MaxScore: 100} // Check go.mod exists (build will work) if _, err := os.Stat("go.mod"); err == nil { r.Score += 40 r.Items = append(r.Items, AuditItem{Name: "go.mod", Score: 40, Max: 40}) } else { r.Items = append(r.Items, AuditItem{Name: "go.mod", Score: 0, Max: 40, Notes: "missing"}) } // Check Makefile exists if _, err := os.Stat("Makefile"); err == nil { r.Score += 30 r.Items = append(r.Items, AuditItem{Name: "Makefile", Score: 30, Max: 30}) } else { r.Items = append(r.Items, AuditItem{Name: "Makefile", Score: 0, Max: 30, Notes: "missing"}) } // Check for CI config ciConfigs := []string{".github/workflows", ".gitlab-ci.yml", "Jenkinsfile", ".circleci"} found := false for _, ci := range ciConfigs { if _, err := os.Stat(ci); err == nil { found = true r.Score += 30 r.Items = append(r.Items, AuditItem{Name: "CI config", Score: 30, Max: 30, Notes: ci}) break } } if !found { r.Items = append(r.Items, AuditItem{Name: "CI config", Score: 0, Max: 30, Notes: "not found"}) } r.Percent = (r.Score / r.MaxScore) * 100 return r } func min(a, b float64) float64 { if a < b { return a } return b } import "strings" ``` Note: The script above has a deliberate syntax issue (import at the end) — move the `import "strings"` to the import block at the top when using.