174 lines
7.9 KiB
Markdown
174 lines
7.9 KiB
Markdown
---
|
|
name: dos-verify-done-claims
|
|
description: "Before accepting an agent's 'done / shipped / fixed' claim, verify it against ground truth (git ancestry + the commit's own diff) using the DOS kernel's `dos verify` and `dos commit-audit` — never the agent's own narration."
|
|
category: quality
|
|
risk: safe
|
|
source: community
|
|
source_repo: anthony-chaudhary/dos-kernel
|
|
source_type: community
|
|
date_added: "2026-06-12"
|
|
author: anthony-chaudhary
|
|
tags: [verification, git, ai-agents, trust, quality-gate]
|
|
tools: [claude, cursor, gemini]
|
|
license: "MIT"
|
|
license_source: "https://github.com/anthony-chaudhary/dos-kernel/blob/master/LICENSE"
|
|
---
|
|
|
|
# Verify done-claims against ground truth, not the agent's word
|
|
|
|
## Overview
|
|
|
|
When an AI agent says "done", "shipped", or "fixed", that is a **claim**, not a
|
|
fact — and a claim the agent checks by re-reading its own work is *consistency,
|
|
not grounding*. This skill replaces that self-report with a verdict from a
|
|
witness the agent did not author: it shells the **DOS kernel** (`dos verify`,
|
|
`dos commit-audit`) to confirm the claimed effect from git ancestry and the
|
|
commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is
|
|
git-only and offline as used here; the one exception is `dos verify` in a
|
|
workspace that wires a CI oracle, which `--no-ci` suppresses (see Security &
|
|
Safety Notes).
|
|
|
|
This skill adapts the DOS reference "witness-claim" pattern
|
|
(`anthony-chaudhary/dos-kernel`) into a host-agnostic screenplay.
|
|
|
|
## When to Use This Skill
|
|
|
|
- Use when an agent reports a task/phase/feature as **complete** and you want
|
|
that "done" confirmed from evidence before building on it.
|
|
- Use right after a commit, to confirm the commit's **message matches its diff**
|
|
(catch a `fix:` that only touched a README, or a "tests pass" that deleted the
|
|
assertions).
|
|
- Use when folding many sub-agents' results — verify each claimed effect instead
|
|
of trusting the return string.
|
|
- **Do not** use it to judge whether code is *correct* — that is what the test
|
|
suite is for. This skill checks did-the-claimed-thing-actually-ship.
|
|
|
|
## How It Works
|
|
|
|
### Step 1: Install the kernel (once)
|
|
|
|
```bash
|
|
pip install dos-kernel # provides the `dos` CLI; deterministic, no key
|
|
```
|
|
|
|
### Step 2: Audit the latest commit's claim vs its diff
|
|
|
|
A commit subject is forgeable (whoever wrote the message authored it); the files
|
|
it touched are not (git did). `dos commit-audit` grades the subject against the
|
|
actual diff:
|
|
|
|
```bash
|
|
dos commit-audit --workspace . HEAD --json
|
|
```
|
|
|
|
`commit-audit --json` prints a JSON **array** of audited commits (one element
|
|
even for a single `HEAD`), so read `verdict` from the first element — e.g.
|
|
`dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'`. (Without
|
|
`--json` the same verdict prints as a one-line text row: `· OK …`,
|
|
`⚑ UNWITNESSED …`, or `· abstain …`.) The verdicts are: `OK` (the diff backs the
|
|
claim's *kind*), `CLAIM_UNWITNESSED` (the subject's claim is not evidenced by the
|
|
diff — treat the "done" as unproven), or `ABSTAIN`. This judges the *kind* of
|
|
change, never correctness — run the tests for that.
|
|
|
|
### Step 3: Verify a named phase actually shipped
|
|
|
|
If the agent claims a specific plan/phase landed, confirm it from git history
|
|
rather than the transcript:
|
|
|
|
```bash
|
|
dos verify --workspace . PLAN PHASE --json --no-ci
|
|
```
|
|
|
|
`--no-ci` keeps the verdict git-only (see the Security note below). With `--json`
|
|
you get the `shipped` and `source` fields. (The default text form prints
|
|
`SHIPPED PLAN PHASE (via grep)` or `NOT_SHIPPED PLAN PHASE (via none)` — the same
|
|
verdict, and the process exit code is non-zero when not shipped.)
|
|
|
|
Grade `shipped: true` by the `source`, because git fallback grades itself by
|
|
**forgeability** — and forgeable evidence is exactly what this skill exists to
|
|
distrust:
|
|
|
|
- `registry` or `grep-artifact` — **non-forgeable** (a registry row, or an
|
|
artefact/diff rung). This closes the claim.
|
|
- `grep-subject` (or bare `grep`) — **forgeable**: a commit *subject* or body
|
|
carried the phase token, which an agent can write without doing the work (even
|
|
on an empty commit). Treat this as *shipped-per-the-subject*, not confirmed —
|
|
corroborate it (run `dos commit-audit` on that commit, below) before you close.
|
|
- `none` — no positive evidence; accept as "not shipped", not as a tool failure.
|
|
|
|
### Step 4: Fold only confirmed effects
|
|
|
|
Accept the agent's "done" **only** when Step 2/3 corroborate it. If
|
|
`CLAIM_UNWITNESSED` or `shipped: false`, the work is not done regardless of how
|
|
confidently the agent narrated it — send it back.
|
|
|
|
## Examples
|
|
|
|
### Example 1: gate an agent's "I fixed the bug" claim
|
|
|
|
```bash
|
|
# The agent committed and said it's fixed. Check the diff backs the claim.
|
|
# commit-audit --json returns an array, so read the first element's verdict:
|
|
dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'
|
|
# OK -> the change is of the claimed kind; now run the tests
|
|
# CLAIM_UNWITNESSED -> the commit doesn't do what it says; reject
|
|
```
|
|
|
|
### Example 2: confirm a feature phase shipped before closing a ticket
|
|
|
|
```bash
|
|
dos verify --workspace . AUTH AUTH2 --json --no-ci
|
|
# shipped: true, source: registry|grep-artifact -> non-forgeable; safe to close
|
|
# shipped: true, source: grep-subject|grep -> forgeable subject/body match;
|
|
# shipped-per-the-subject only -> corroborate with commit-audit before closing
|
|
# shipped: false, source: none -> no evidence; keep the ticket open
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
- ✅ Run `dos commit-audit HEAD` immediately after every agent commit.
|
|
- ✅ Treat `source: none` / `CLAIM_UNWITNESSED` as "not done", not as a tool error.
|
|
- ✅ Close a claim on a **non-forgeable** `source` (`registry`, `grep-artifact`).
|
|
Treat `grep-subject` / bare `grep` as forgeable (an agent can write the subject
|
|
text) — corroborate before closing.
|
|
- ✅ Keep the test suite as the separate correctness gate — this skill checks shipping, not correctness.
|
|
- ❌ Don't accept a "done" because the agent's prose was confident.
|
|
- ❌ Don't use this to replace code review or testing.
|
|
|
|
## Limitations
|
|
|
|
- This skill does not replace environment-specific validation, testing, or expert review.
|
|
- It checks whether a claimed change *shipped* / matches its diff — not whether the code is *correct*.
|
|
- `dos verify` reads git history; in a repo with no commits there is nothing to witness (it will honestly report `source: none`).
|
|
- Stop and ask for clarification if required inputs (a git repo, the `dos` CLI) are missing.
|
|
|
|
## Security & Safety Notes
|
|
|
|
- This skill runs shell commands: `pip install dos-kernel` and the read-only
|
|
`dos` verbs (`dos commit-audit`, `dos verify`). These verbs never **mutate**
|
|
the repo or push. `dos commit-audit` only reads git history and the working
|
|
tree (no network). `dos verify` is also git-only **unless** the workspace has
|
|
wired a CI oracle (`[verify] non_git_oracle` in its `dos.toml`), in which case
|
|
it may shell a network check (e.g. `gh api`) for the verdict — pass `--no-ci`
|
|
(as the examples above do) to force the git-only path and guarantee no network.
|
|
- `pip install dos-kernel` installs from PyPI. The distribution name is
|
|
`dos-kernel` (the bare `dos` on PyPI is an unrelated package — do not install
|
|
it). Pin a version in locked environments.
|
|
- Run in the repository you intend to adjudicate; the `--workspace .` argument
|
|
scopes every verdict to that repo.
|
|
|
|
## Common Pitfalls
|
|
|
|
- **Problem:** `dos verify` returns `source: none` and it looks like a failure.
|
|
**Solution:** That is the honest "no evidence" verdict — it means the phase has
|
|
no ship commit, so the claim is unproven. Re-stamp the real commit or keep the
|
|
task open.
|
|
- **Problem:** Installing the wrong package.
|
|
**Solution:** The PyPI name is `dos-kernel`, not `dos`.
|
|
|
|
## Related Skills
|
|
|
|
- The upstream DOS reference screenplays (`dos-witness-claim`, `dos-goal-gate`)
|
|
in `anthony-chaudhary/dos-kernel` cover the multi-agent fan-out and
|
|
self-stopping-agent variants of this same witness discipline.
|