playbook/antigravity-awesome-skills/skills/yao-meta-skill/references/eval-playbook.md

70 lines
1.3 KiB
Markdown

# Trigger And Eval Playbook
Use this playbook for skills that matter enough to test.
## A. Trigger Evaluation
Create three prompt buckets:
### 1. Should Trigger
Prompts that clearly need the skill.
Goal:
- verify recall
### 2. Should Not Trigger
Prompts that are clearly outside the skill boundary.
Goal:
- verify precision
### 3. Near Neighbors
Prompts that look similar but should use another skill or no skill.
Goal:
- catch false positives and ambiguous routing
## B. Execution Evaluation
For each important use case, create 1 to 3 realistic prompts with:
- user-like phrasing
- representative inputs or file types
- expected output description
- key checks
## C. Revision Loop
When a skill underperforms:
1. Fix boundary or description problems before adding more body text.
2. Move brittle logic into scripts or templates.
3. Split reference content if `SKILL.md` becomes bloated.
4. Re-run the same eval set before expanding scope.
## D. Minimum QA By Skill Tier
### Personal skill
- 2 realistic prompts
- manual review
### Team skill
- 3 to 5 realistic prompts
- trigger positives and negatives
- one revision loop
### Infrastructure or meta-skill
- 5+ execution prompts
- trigger positives, negatives, and near neighbors
- benchmark notes across revisions
- ownership and drift review