--- name: dos-verify-done-claims description: "Before accepting an agent's 'done / shipped / fixed' claim, verify it against ground truth (git ancestry + the commit's own diff) using the DOS kernel's `dos verify` and `dos commit-audit` — never the agent's own narration." category: quality risk: safe source: community source_repo: anthony-chaudhary/dos-kernel source_type: community date_added: "2026-06-12" author: anthony-chaudhary tags: [verification, git, ai-agents, trust, quality-gate] tools: [claude, cursor, gemini] license: "MIT" license_source: "https://github.com/anthony-chaudhary/dos-kernel/blob/master/LICENSE" --- # Verify done-claims against ground truth, not the agent's word ## Overview When an AI agent says "done", "shipped", or "fixed", that is a **claim**, not a fact — and a claim the agent checks by re-reading its own work is *consistency, not grounding*. This skill replaces that self-report with a verdict from a witness the agent did not author: it shells the **DOS kernel** (`dos verify`, `dos commit-audit`) to confirm the claimed effect from git ancestry and the commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is git-only and offline as used here; the one exception is `dos verify` in a workspace that wires a CI oracle, which `--no-ci` suppresses (see Security & Safety Notes). This skill adapts the DOS reference "witness-claim" pattern (`anthony-chaudhary/dos-kernel`) into a host-agnostic screenplay. ## When to Use This Skill - Use when an agent reports a task/phase/feature as **complete** and you want that "done" confirmed from evidence before building on it. - Use right after a commit, to confirm the commit's **message matches its diff** (catch a `fix:` that only touched a README, or a "tests pass" that deleted the assertions). - Use when folding many sub-agents' results — verify each claimed effect instead of trusting the return string. - **Do not** use it to judge whether code is *correct* — that is what the test suite is for. This skill checks did-the-claimed-thing-actually-ship. ## How It Works ### Step 1: Install the kernel (once) ```bash pip install dos-kernel # provides the `dos` CLI; deterministic, no key ``` ### Step 2: Audit the latest commit's claim vs its diff A commit subject is forgeable (whoever wrote the message authored it); the files it touched are not (git did). `dos commit-audit` grades the subject against the actual diff: ```bash dos commit-audit --workspace . HEAD --json ``` `commit-audit --json` prints a JSON **array** of audited commits (one element even for a single `HEAD`), so read `verdict` from the first element — e.g. `dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'`. (Without `--json` the same verdict prints as a one-line text row: `· OK …`, `⚑ UNWITNESSED …`, or `· abstain …`.) The verdicts are: `OK` (the diff backs the claim's *kind*), `CLAIM_UNWITNESSED` (the subject's claim is not evidenced by the diff — treat the "done" as unproven), or `ABSTAIN`. This judges the *kind* of change, never correctness — run the tests for that. ### Step 3: Verify a named phase actually shipped If the agent claims a specific plan/phase landed, confirm it from git history rather than the transcript: ```bash dos verify --workspace . PLAN PHASE --json --no-ci ``` `--no-ci` keeps the verdict git-only (see the Security note below). With `--json` you get the `shipped` and `source` fields. (The default text form prints `SHIPPED PLAN PHASE (via grep)` or `NOT_SHIPPED PLAN PHASE (via none)` — the same verdict, and the process exit code is non-zero when not shipped.) Grade `shipped: true` by the `source`, because git fallback grades itself by **forgeability** — and forgeable evidence is exactly what this skill exists to distrust: - `registry` or `grep-artifact` — **non-forgeable** (a registry row, or an artefact/diff rung). This closes the claim. - `grep-subject` (or bare `grep`) — **forgeable**: a commit *subject* or body carried the phase token, which an agent can write without doing the work (even on an empty commit). Treat this as *shipped-per-the-subject*, not confirmed — corroborate it (run `dos commit-audit` on that commit, below) before you close. - `none` — no positive evidence; accept as "not shipped", not as a tool failure. ### Step 4: Fold only confirmed effects Accept the agent's "done" **only** when Step 2/3 corroborate it. If `CLAIM_UNWITNESSED` or `shipped: false`, the work is not done regardless of how confidently the agent narrated it — send it back. ## Examples ### Example 1: gate an agent's "I fixed the bug" claim ```bash # The agent committed and said it's fixed. Check the diff backs the claim. # commit-audit --json returns an array, so read the first element's verdict: dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict' # OK -> the change is of the claimed kind; now run the tests # CLAIM_UNWITNESSED -> the commit doesn't do what it says; reject ``` ### Example 2: confirm a feature phase shipped before closing a ticket ```bash dos verify --workspace . AUTH AUTH2 --json --no-ci # shipped: true, source: registry|grep-artifact -> non-forgeable; safe to close # shipped: true, source: grep-subject|grep -> forgeable subject/body match; # shipped-per-the-subject only -> corroborate with commit-audit before closing # shipped: false, source: none -> no evidence; keep the ticket open ``` ## Best Practices - ✅ Run `dos commit-audit HEAD` immediately after every agent commit. - ✅ Treat `source: none` / `CLAIM_UNWITNESSED` as "not done", not as a tool error. - ✅ Close a claim on a **non-forgeable** `source` (`registry`, `grep-artifact`). Treat `grep-subject` / bare `grep` as forgeable (an agent can write the subject text) — corroborate before closing. - ✅ Keep the test suite as the separate correctness gate — this skill checks shipping, not correctness. - ❌ Don't accept a "done" because the agent's prose was confident. - ❌ Don't use this to replace code review or testing. ## Limitations - This skill does not replace environment-specific validation, testing, or expert review. - It checks whether a claimed change *shipped* / matches its diff — not whether the code is *correct*. - `dos verify` reads git history; in a repo with no commits there is nothing to witness (it will honestly report `source: none`). - Stop and ask for clarification if required inputs (a git repo, the `dos` CLI) are missing. ## Security & Safety Notes - This skill runs shell commands: `pip install dos-kernel` and the read-only `dos` verbs (`dos commit-audit`, `dos verify`). These verbs never **mutate** the repo or push. `dos commit-audit` only reads git history and the working tree (no network). `dos verify` is also git-only **unless** the workspace has wired a CI oracle (`[verify] non_git_oracle` in its `dos.toml`), in which case it may shell a network check (e.g. `gh api`) for the verdict — pass `--no-ci` (as the examples above do) to force the git-only path and guarantee no network. - `pip install dos-kernel` installs from PyPI. The distribution name is `dos-kernel` (the bare `dos` on PyPI is an unrelated package — do not install it). Pin a version in locked environments. - Run in the repository you intend to adjudicate; the `--workspace .` argument scopes every verdict to that repo. ## Common Pitfalls - **Problem:** `dos verify` returns `source: none` and it looks like a failure. **Solution:** That is the honest "no evidence" verdict — it means the phase has no ship commit, so the claim is unproven. Re-stamp the real commit or keep the task open. - **Problem:** Installing the wrong package. **Solution:** The PyPI name is `dos-kernel`, not `dos`. ## Related Skills - The upstream DOS reference screenplays (`dos-witness-claim`, `dos-goal-gate`) in `anthony-chaudhary/dos-kernel` cover the multi-agent fan-out and self-stopping-agent variants of this same witness discipline.