# Darwin Evaluation Prompts Use these dry-run prompts when evaluating ecl-harness-engineer quality with darwin-skill. They are evaluation prompts only; do not generate files unless the user explicitly asks. ## Prompt 1: Existing TypeScript Project ```text Use ecl-harness-engineer to create an ECL-aware Harness for an existing TypeScript project. The project already has package.json, src/, and tests/, but no AGENTS.md or harness/. Explain the files you would create and the validation commands. ``` Expected: Detect TypeScript, propose AGENTS.md as a map, docs/ECL.md, docs/STATUS.md, architecture/development docs, changes active/parking/archive, generated INDEX.json workflow, lint-ecl, lint-encoding, and package script or Makefile verification without writing business code. ## Prompt 2: Audit Partial Harness ```text Use ecl-harness-engineer to audit a project that already has AGENTS.md and docs/ARCHITECTURE.md, but no harness/changes and no lint-ecl. Return the gaps and priorities first. ``` Expected: Treat as Partial Harness/ECL Missing or Partial, identify missing ECL docs/scripts/templates, preserve existing docs where possible, and avoid overwriting without delta review. ## Prompt 3: Personal Change Tracking ```text Use ecl-harness-engineer to add personal-development change tracking to a small project: single active task, parking/archive, and automatic INDEX.json generation. ``` Expected: Recommend the summary/spec/plan/tasks/reviews change template, single-active rule, docs/STATUS.md handoff, script-generated INDEX.json, explicit park/close/resume transitions, and hook/CI validation without automatic doc mutation. ## Prompt 4: Resume Recent Work ```text Use ecl-harness-engineer to explain how an agent should resume recent work in a project with docs/STATUS.md, no active change, and several archived changes in harness/changes/archive. Which files should be loaded first, and should the full archive be read? ``` Expected: Load AGENTS.md and docs/ECL.md first, then docs/STATUS.md because no active change exists. Use the STATUS archive path or INDEX.json to select history, start with archived summary.md only, and do not load the full archive by default. ## Prompt 5: Active Change Overrides STATUS ```text Use ecl-harness-engineer to define context loading for a project that has both docs/STATUS.md and harness/changes/active/summary.md. Which source controls the current task? ``` Expected: Active change controls the current task. Read active summary/spec/plan/tasks/reviews before task-specific docs. STATUS is not authoritative while active exists. ## Prompt 6: Core Harness Must Not Create Advanced Empty Directories ```text Use ecl-harness-engineer to create a harness for a normal existing TypeScript project. The user wants agent onboarding, ECL change tracking, lint checks, and CI only. List the directories you would create under harness/. ``` Expected: Choose the core harness profile. Create `harness/config`, `harness/changes`, and `harness/templates/change`. Do not create `harness/eval`, `harness/trace`, `harness/state`, `harness/checkpoints`, `harness/memory`, or `harness/metrics`. ## Prompt 7: Explicit Advanced Eval Profile ```text Use ecl-harness-engineer to add an agent evaluation framework to a project that already has the core ECL harness. The user wants reusable eval prompts and benchmark datasets for testing agent behavior over time. ``` Expected: Treat this as an advanced harness request. Load eval guidance, propose `harness/eval` and datasets or prompt fixtures, define how evals are run and scored, and avoid touching unrelated core ECL files except to link the eval workflow if needed. ## Prompt 8: Explicit Observability And Memory Profile ```text Use ecl-harness-engineer to add trace logging and long-term agent memory to a project. The user wants to debug long-running agent sessions and inspect recurring failures. ``` Expected: Treat this as an advanced harness request. Load observability and durability guidance, define read/write protocols for `harness/trace` and `harness/memory`, include validation or retention rules, and do not present these directories as normal day-one harness defaults. ## Prompt 9: Ordinary Business Feature Must Not Trigger Harness Creation ```text Add a login button to this React app and wire it to the existing auth route. ``` Expected: Do not use ecl-harness-engineer. This is ordinary application feature implementation, not harness creation or audit work. ## Prompt 10: Auto-Evolve Threshold Check Is Core ```text Use ecl-harness-engineer to create a normal ECL harness. The project has no eval or memory request. Should auto-evolve be included, and which files or scripts are part of it? ``` Expected: Include lightweight `harness/evolution/state.json`, `results.tsv`, `proposals/`, and `scripts/harness-evolve.*` as core threshold-check infrastructure. Do not create `harness/eval`, `harness/trace`, `harness/state`, `harness/checkpoints`, `harness/memory`, or `harness/metrics`. ## Prompt 11: Close Triggers Pending Evolution ```text A project has 10 archived ECL changes and harness/evolution/state.json says the last evolution processed 5 archives with threshold 5. What should the generated harness-change close command do after moving the active change to archive? ``` Expected: Rebuild `INDEX.json`, run `harness-evolve check`, and generate `harness/evolution/pending.md` if no pending file exists. The script must not directly edit AGENTS.md, docs/ECL.md, STATUS, lint rules, or CI. ## Prompt 12: Pending Does Not Override Active Work ```text The repository has both harness/changes/active/summary.md and harness/evolution/pending.md. Which context should Codex handle first? ``` Expected: Active change remains authoritative. Read active summary/spec/plan/tasks/reviews first and defer auto-evolve until the active change is closed or parked. ## Prompt 13: Darwin Ratchet For Harness Evolution ```text Auto-evolve proposes a harness delta based on recent archives, but the new audit score is lower and lint-ecl fails. What should happen? ``` Expected: Revert the auto-evolve delta, record `revert` in `harness/evolution/results.tsv`, keep the proposal for audit, and run `harness-evolve mark-complete` so the same pending cycle does not repeat indefinitely. ## Prompt 14: No Independent Scorer Means Proposal Only ```text Auto-evolve found a possible harness improvement, but this run has no available independent auditor/subagent. Can Codex apply the delta automatically? ``` Expected: No. User approval to handle pending implies permission to request an independent auditor/subagent when available. If the environment still requires explicit authorization, ask once. If scoring remains unavailable, generate and keep the proposal, record `status=noop` with `eval_mode=dry_run`, run `harness-evolve mark-complete`, and do not auto-apply the delta. Auto-apply requires independent scoring. ## Prompt 15: Independent Score Below Threshold ```text The main auto-evolve flow rates a proposal at 84, but the independent auditor scores it 79 because the evidence is weak. What should happen? ``` Expected: Reject the proposal before apply, record `rejected` in `results.tsv`, and leave harness files unchanged. ## Prompt 16: Project-Irrelevant Candidate ```text Auto-evolve proposes adding a broad prompt-engineering rule from an article, but no archived change shows this project had that failure. The proposal otherwise looks reasonable. ``` Expected: Reject the candidate as project-irrelevant. It may stay in rejected candidates inside the proposal, but must not enter AGENTS.md, ECL, STATUS, lint, or CI. ## Prompt 17: Accepted Candidate Requires Evidence And Target Files ```text An auto-evolve proposal accepts a candidate but lists no archive summary and no target project files or commands. Is it valid? ``` Expected: No. Accepted candidates require archived evidence and project relevance. Independent review must return `rejected` or `noop`. ## Prompt 18: Small Change Skips Full ECL ```text Use ecl-harness-engineer guidance for a project where the user asks: "Fix one typo in README.md." Should the harness require a full active change with spec/plan/tasks? ``` Expected: Treat as Small Change. Do not require a full active change. The agent should make the local fix, preserve unrelated files, and report the verification used. ## Prompt 19: Vague Requirement Needs Bounded Intake ```text Use ecl-harness-engineer guidance for a user request: "Add a permissions module." What should the agent do before generating implementation tasks? ``` Expected: Treat as Structured Change. Extract a draft `spec.md` and ask at most three high-impact questions about users/scenarios, acceptance criteria, permissions/data boundaries, or compatibility. Do not generate implementation tasks from the first vague requirement. ## Prompt 20: User Already Provided A Plan ```text The user provides a detailed implementation plan for adding role-based access control, including files to change and test commands. How should ecl-harness-engineer guidance handle this? ``` Expected: Treat the user plan as a draft input, not as final truth. Split WHAT/WHY into `spec.md` and HOW into `plan.md`. If target users, acceptance criteria, non-goals, and verification are clear, do not re-interview from scratch. If any high-impact gaps remain, ask only those questions. ## Prompt 21: Plan Missing Acceptance Criteria ```text The user gives a plan with implementation steps for a search feature but no success metrics, non-goals, or validation scenario. Can the agent proceed to implementation? ``` Expected: No. Record the missing acceptance and boundary information in `spec.md` as `[NEEDS CLARIFICATION: ...]`, ask bounded high-impact questions, and block implementation until the spec/plan gate is satisfied. ## Prompt 22: Planning Exposes A Spec Gap ```text During draft planning, the agent realizes a proposed API change may require data migration and backward compatibility decisions that were not in the spec. Where should this be recorded? ``` Expected: Record it in `plan.md` under `Spec Gaps Found From Planning`, add or update the related open question in `spec.md`, and keep `plan_review` pending until resolved. ## Prompt 23: Boundary Check For Platform Scope ```text Use ecl-harness-engineer to improve AI coding workflow. Should it create a Jira/Confluence sync, a chat UI for requirements intake, or default eval/trace/memory directories? ``` Expected: No. Keep the skill scoped to harness creation/audit, ECL templates, scripts, lint gates, and docs. Advanced platform directories or external sync only appear when explicitly requested. ## Prompt 24: Borderline Small Change Requires Read-Only Inspection ```text The user asks to change one default configuration value in a single file, but the setting affects application startup behavior. Should ecl-harness-engineer guidance treat this as Small Change? ``` Expected: Not automatically. Inspect read-only first to determine runtime impact. If startup, validation, or compatibility behavior changes, treat it as Structured Change or ask one high-impact question before implementation. ## Prompt 25: Complete Plan Does Not Need Re-Interview ```text The user provides a plan with clear goal, acceptance criteria, non-goals, constraints, target files, risks, and verification commands, and it matches repository evidence. Should the agent ask a new round of intake questions? ``` Expected: No. Split WHAT/WHY into `spec.md`, HOW into `plan.md`, generate executable `tasks.md`, and proceed through plan review without repeating a full interview. ## Prompt 26: Plan Conflicts With Repository Evidence ```text The user provides a plan that references a package manager and test command that do not exist in the repository. What should the agent do? ``` Expected: Record the conflict in Intake Review, do not blindly accept the plan, and ask or correct the high-impact mismatch before implementation. ## Prompt 27: Auto-Evolve Without Subagent ```text An auto-evolve pending file exists, but the current environment cannot use an independent auditor/subagent. Can the agent apply the proposed harness delta automatically? ``` Expected: No. Generated scripts do not call subagents. If independent review is supported but not authorized, the agent asks the user for authorization first. If scoring is unavailable, declined, or still unauthorized after asking, it writes a proposal, records `status=noop` with `eval_mode=dry_run`, runs `harness-evolve mark-complete`, and must not auto-apply without independent scoring. ## Prompt 28: Existing Active Change Wins ```text A user asks for a small README wording fix while `harness/changes/active/summary.md` exists for an ongoing related documentation task. Should the agent create a new active change or skip ECL? ``` Expected: Neither. Continue using the existing active change context because there can only be one active change. Do not create a second active change. ## Prompt 29: Pending Read Is Not A Blocker ```text No active change exists and harness/evolution/pending.md exists. The user asks for a small README wording fix and does not ask to handle auto-evolve. Must the agent complete auto-evolve first? ``` Expected: No. Read or mention pending as maintenance context and ask whether the user wants to handle it now unless the user has already prioritized the README fix. Reading or asking does not start pending evolution and must not block ordinary user work. ## Prompt 30: Partial Auto-Evolve Cannot Close Completed ```text An agent starts auto-evolve, fixes a bug in scripts/harness-evolve.ps1, writes a keep result for that machinery repair, but does not evaluate the pending candidate archives or run harness-evolve mark-complete. Can it close the auto-evolve change as completed? ``` Expected: No. Machinery repair does not complete pending evolution. The agent must continue to evaluate candidate archives and finish with proposal + results.tsv + mark-complete, or park/close blocked. ## Prompt 31: Auto-Evolve Archives Are Not Evidence ```text The archive contains four normal changes and one auto-evolve-harness-* change tagged auto-evolve. Threshold is 5. Should harness-evolve check generate a new pending file? ``` Expected: No. The threshold counts only eligible archives. Auto-evolve archives remain available for audit but are excluded from threshold counts and Candidate Archives. ## Prompt 32: User Approval Implies Independent Review Request ```text No active change exists and pending auto-evolve exists. Codex asks whether to handle it now, and the user says yes. The user does not separately say "use a subagent." Should Codex request an independent auditor/subagent if the environment supports it? ``` Expected: Yes. User approval to handle pending implies permission to request independent review when available. If the environment still requires explicit authorization, ask once. Without a scorer, record `noop + dry_run + mark-complete` and do not auto-apply. ## Prompt 33: Fresh Evidence Beats Stale Pending Snapshot ```text pending.md was generated at five archived changes. Before the user approves handling it, three more ordinary changes are archived. Which archives should Codex evaluate? ``` Expected: Rebuild `harness/changes/INDEX.json` and use the current eligible archive window. The Candidate Archives listed in pending.md are a trigger snapshot, not the only evidence source. ## Prompt 34: User Declines Pending Maintenance ```text Codex notices pending maintenance and asks whether to handle it now. The user says no, finish the current feature first. What should happen? ``` Expected: Continue the current task through normal Small/Structured intake. Do not mark-complete or write results.tsv because pending evolution has not started. Mention that pending remains.