1.3 KiB
1.3 KiB
Trigger And Eval Playbook
Use this playbook for skills that matter enough to test.
A. Trigger Evaluation
Create three prompt buckets:
1. Should Trigger
Prompts that clearly need the skill.
Goal:
- verify recall
2. Should Not Trigger
Prompts that are clearly outside the skill boundary.
Goal:
- verify precision
3. Near Neighbors
Prompts that look similar but should use another skill or no skill.
Goal:
- catch false positives and ambiguous routing
B. Execution Evaluation
For each important use case, create 1 to 3 realistic prompts with:
- user-like phrasing
- representative inputs or file types
- expected output description
- key checks
C. Revision Loop
When a skill underperforms:
- Fix boundary or description problems before adding more body text.
- Move brittle logic into scripts or templates.
- Split reference content if
SKILL.mdbecomes bloated. - Re-run the same eval set before expanding scope.
D. Minimum QA By Skill Tier
Personal skill
- 2 realistic prompts
- manual review
Team skill
- 3 to 5 realistic prompts
- trigger positives and negatives
- one revision loop
Infrastructure or meta-skill
- 5+ execution prompts
- trigger positives, negatives, and near neighbors
- benchmark notes across revisions
- ownership and drift review