9.0 KiB
Skill Engineering Method
This doctrine defines the default method for turning messy workflow material into a reusable skill without bloating the entrypoint.
Core Loop
- Decide whether the request should become a skill at all.
- Run a short intent dialogue to capture the real job, outputs, exclusions, and constraints.
- Choose the smallest viable archetype.
- Set one clear capability boundary.
- Write and test the trigger description before expanding the body.
- Apply authoring discipline: name unresolved assumptions, keep scope small, and tie meaningful changes to checks.
- Add an output risk profile for user-facing artifacts.
- Add an artifact design profile for reports, tutorials, viewers, dashboards, screenshots, and visual pages.
- Add only the gates that match the risk.
- Ship the first routeable package, then pick the three highest-value next iteration directions.
- Package and govern the skill only as far as real reuse demands.
Phase 1: Qualification
Promote a request into a skill only when at least one of these is true:
- the workflow will be reused
- the workflow is easy to route incorrectly
- deterministic scripts reduce repeated effort
- governance or portability matters
Reject skill creation when the request is only:
- explanation
- summary
- translation
- brainstorming
- documentation without agent execution
- a one-off answer with no reuse value
Phase 1.5: Authoring Discipline
Before expanding the package, apply the execution discipline that keeps the work grounded.
- clarify only the assumptions that change the package design
- do not add speculative features, generic configurability, or decorative structure
- when editing an existing skill, touch only files that directly serve the requested change
- connect each meaningful change to a check: route evidence, sample run, resource-boundary check, governance check, or reviewer note
See Authoring Discipline.
Phase 1.6: Problem Diagnosis
When the user gives a fuzzy pain point instead of a clear skill request, diagnose the likely package shape before asking for structure.
- infer whether the need is best served by a scaffold, production workflow, library capability, governed asset, or no skill
- recommend at most three candidate directions
- explain why each candidate fits, where it is limited, and what the first light version should prove
- prefer a concrete recommendation over a menu when the intent is clear enough
This keeps discovery useful for users who can describe the problem but not the skill architecture.
Phase 2: Intent Dialogue
Before deep authoring, ask only the questions that change the package design.
- open with a human, teacher-like framing rather than a cold field list
- let the user answer naturally first; offer a tiny template only as an optional shortcut
- what recurring job should the skill own
- what real inputs will users hand to it
- what outputs must it produce
- what near-neighbor requests should stay out of scope
- whether the user has reference systems, repos, or products worth learning from
- what constraints matter: privacy, naming, portability, governance, or local fit
See Intent Dialogue.
Phase 3: Archetype Selection
Choose the lightest archetype that fits the job.
Scaffold: exploratory, personal, or short-livedProduction: team-reused, quality-sensitive, but still compactLibrary: broad reuse, visible evidence, portability, and maintenance expectationsGoverned: organizationally sensitive or operationally critical; lifecycle and review are explicit
See Skill Archetypes.
Phase 4: Boundary Design
Every skill should answer four questions clearly:
- what recurring job does it own
- what outputs does it produce
- what near-neighbor requests should not route here
- what detail belongs outside
SKILL.md
Boundary work comes before polishing prose.
Phase 5: Reference Scan
Run a short benchmark pass before deep authoring.
- scan
3-5reference objects at most - prioritize high-star external GitHub and official benchmark sources first
- ask for user-supplied references second, but extract only patterns and standards
- use local files third, only for fit, privacy, and compatibility calibration
- choose from method, structure, execution, portability, and domain patterns
- extract only what passes the pattern test: recurrence, generativity, distinctiveness, and boundary clarity
- record what not to borrow so the new skill stays light
See Reference Scan Strategy and Pattern Extraction Doctrine.
Phase 5.5: Output Risk Profiling
Before treating the package as usable, predict the likely mistakes in its final user-facing output.
- tutorial skills should guard against generic headings, vague examples, and missing success checks
- report and Markdown skills should guard against weak tables, dense lists, and poor hierarchy
- screenshot or visual skills should guard against wrong captures, missing assets, and invented visual evidence
- research or citation-heavy skills should guard against footnote clutter and unsupported claims
- code or command skills should guard against hidden cwd, input, output, and side-effect assumptions
Generate reports/output-risk-profile.md and expose it to the reviewer before adding more structure.
See Output Quality Risk.
Phase 5.6: Artifact Design Profiling
Before approving generated reports or visual outputs, define how the artifact should read and scan.
- choose the artifact family: tutorial, report, review viewer, dashboard, visual evidence, slide-like narrative, or code/CLI guide
- let the content choose the visual system instead of copying a fixed house style
- borrow document discipline from Kami: route by document type, distill content, verify layout-critical assumptions
- borrow slide discipline from presentation skills: plan hierarchy, density, rhythm, and quality gates before writing HTML
- reject generic headings, noisy citations, weak tables, wrong screenshots, repeated card grids, and decorative visual defaults
Generate reports/artifact-design-profile.md and expose it in the overview and review viewer.
See Artifact Design Doctrine and Output Visual Quality.
Phase 5.7: Prompt Quality Profiling
When a skill depends on prompt behavior, role design, dialogue quality, or output contracts, turn prompt methodology into an evidence profile.
- identify the explicit need, implicit need, scenario, user level, and success standard
- map Role, Task, and Format into skill behavior instead of copying a full meta-prompt into
SKILL.md - classify the task family and complexity before adding structure
- score completeness, clarity, consistency, practicality, and specificity
- expose the full reasoning to reviewers while keeping the user-facing flow recommendation-led
Generate reports/prompt-quality-profile.md and expose it in the overview and review viewer.
See Prompt Engineering Doctrine.
Phase 6: Trigger-First Authoring
Author the frontmatter description before expanding the body.
- start with the recurring job
- include the trigger actions that should route here
- include exclusions when confusion is plausible
- test the route before growing the file tree
Trigger quality is improved through:
trigger_eval.pyoptimize_description.py- blind holdout
- judge-backed blind holdout
- adversarial holdout
- route confusion
Phase 7: Gate Selection
Add gates by risk, not by habit.
- low-risk scaffolds: validate structure and context size
- production skills: trigger eval plus resource-boundary checks
- library skills: description optimization, route confusion, packaging checks
- governed skills: governance scoring, lifecycle metadata, regression history
See Gate Selection.
Phase 8: First Iteration Philosophy
The first package is a routeable baseline, not the final answer.
- improve trigger and exclusions before growing prose
- add one execution asset before adding many documents
- surface the three highest-value next moves so authors do not expand in every direction at once
- prefer the smallest step that increases reliability more than context cost
- move unverifiable ideas into next-step candidates instead of shipping them as baseline structure
See Iteration Philosophy and Authoring Discipline.
Phase 9: Promotion
A candidate route or package is promotable only when:
- visible holdout does not regress
- blind holdout does not regress
- judge-backed blind holdout does not regress
- adversarial holdout does not regress
- route confusion stays clean
- context and governance gates still pass
See Promotion Policy.
Design Principle
The method is only correct if rigor grows faster than context cost. If a new check or document makes the skill heavier without making it more reliable, remove or relocate it.