211 lines
9.0 KiB
Markdown
211 lines
9.0 KiB
Markdown
# Skill Engineering Method
|
|
|
|
This doctrine defines the default method for turning messy workflow material into a reusable skill without bloating the entrypoint.
|
|
|
|
## Core Loop
|
|
|
|
1. Decide whether the request should become a skill at all.
|
|
2. Run a short intent dialogue to capture the real job, outputs, exclusions, and constraints.
|
|
3. Choose the smallest viable archetype.
|
|
4. Set one clear capability boundary.
|
|
5. Write and test the trigger description before expanding the body.
|
|
6. Apply authoring discipline: name unresolved assumptions, keep scope small, and tie meaningful changes to checks.
|
|
7. Add an output risk profile for user-facing artifacts.
|
|
8. Add an artifact design profile for reports, tutorials, viewers, dashboards, screenshots, and visual pages.
|
|
9. Add only the gates that match the risk.
|
|
10. Ship the first routeable package, then pick the three highest-value next iteration directions.
|
|
11. Package and govern the skill only as far as real reuse demands.
|
|
|
|
## Phase 1: Qualification
|
|
|
|
Promote a request into a skill only when at least one of these is true:
|
|
|
|
- the workflow will be reused
|
|
- the workflow is easy to route incorrectly
|
|
- deterministic scripts reduce repeated effort
|
|
- governance or portability matters
|
|
|
|
Reject skill creation when the request is only:
|
|
|
|
- explanation
|
|
- summary
|
|
- translation
|
|
- brainstorming
|
|
- documentation without agent execution
|
|
- a one-off answer with no reuse value
|
|
|
|
See [Non-Skill Decision Tree](non-skill-decision-tree.md).
|
|
|
|
## Phase 1.5: Authoring Discipline
|
|
|
|
Before expanding the package, apply the execution discipline that keeps the work grounded.
|
|
|
|
- clarify only the assumptions that change the package design
|
|
- do not add speculative features, generic configurability, or decorative structure
|
|
- when editing an existing skill, touch only files that directly serve the requested change
|
|
- connect each meaningful change to a check: route evidence, sample run, resource-boundary check, governance check, or reviewer note
|
|
|
|
See [Authoring Discipline](authoring-discipline.md).
|
|
|
|
## Phase 1.6: Problem Diagnosis
|
|
|
|
When the user gives a fuzzy pain point instead of a clear skill request, diagnose the likely package shape before asking for structure.
|
|
|
|
- infer whether the need is best served by a scaffold, production workflow, library capability, governed asset, or no skill
|
|
- recommend at most three candidate directions
|
|
- explain why each candidate fits, where it is limited, and what the first light version should prove
|
|
- prefer a concrete recommendation over a menu when the intent is clear enough
|
|
|
|
This keeps discovery useful for users who can describe the problem but not the skill architecture.
|
|
|
|
## Phase 2: Intent Dialogue
|
|
|
|
Before deep authoring, ask only the questions that change the package design.
|
|
|
|
- open with a human, teacher-like framing rather than a cold field list
|
|
- let the user answer naturally first; offer a tiny template only as an optional shortcut
|
|
- what recurring job should the skill own
|
|
- what real inputs will users hand to it
|
|
- what outputs must it produce
|
|
- what near-neighbor requests should stay out of scope
|
|
- whether the user has reference systems, repos, or products worth learning from
|
|
- what constraints matter: privacy, naming, portability, governance, or local fit
|
|
|
|
See [Intent Dialogue](intent-dialogue.md).
|
|
|
|
## Phase 3: Archetype Selection
|
|
|
|
Choose the lightest archetype that fits the job.
|
|
|
|
- `Scaffold`: exploratory, personal, or short-lived
|
|
- `Production`: team-reused, quality-sensitive, but still compact
|
|
- `Library`: broad reuse, visible evidence, portability, and maintenance expectations
|
|
- `Governed`: organizationally sensitive or operationally critical; lifecycle and review are explicit
|
|
|
|
See [Skill Archetypes](skill-archetypes.md).
|
|
|
|
## Phase 4: Boundary Design
|
|
|
|
Every skill should answer four questions clearly:
|
|
|
|
- what recurring job does it own
|
|
- what outputs does it produce
|
|
- what near-neighbor requests should not route here
|
|
- what detail belongs outside `SKILL.md`
|
|
|
|
Boundary work comes before polishing prose.
|
|
|
|
## Phase 5: Reference Scan
|
|
|
|
Run a short benchmark pass before deep authoring.
|
|
|
|
- scan `3-5` reference objects at most
|
|
- prioritize high-star external GitHub and official benchmark sources first
|
|
- ask for user-supplied references second, but extract only patterns and standards
|
|
- use local files third, only for fit, privacy, and compatibility calibration
|
|
- choose from method, structure, execution, portability, and domain patterns
|
|
- extract only what passes the pattern test: recurrence, generativity, distinctiveness, and boundary clarity
|
|
- record what not to borrow so the new skill stays light
|
|
|
|
See [Reference Scan Strategy](reference-scan.md) and [Pattern Extraction Doctrine](pattern-extraction-doctrine.md).
|
|
|
|
## Phase 5.5: Output Risk Profiling
|
|
|
|
Before treating the package as usable, predict the likely mistakes in its final user-facing output.
|
|
|
|
- tutorial skills should guard against generic headings, vague examples, and missing success checks
|
|
- report and Markdown skills should guard against weak tables, dense lists, and poor hierarchy
|
|
- screenshot or visual skills should guard against wrong captures, missing assets, and invented visual evidence
|
|
- research or citation-heavy skills should guard against footnote clutter and unsupported claims
|
|
- code or command skills should guard against hidden cwd, input, output, and side-effect assumptions
|
|
|
|
Generate `reports/output-risk-profile.md` and expose it to the reviewer before adding more structure.
|
|
|
|
See [Output Quality Risk](output-quality-risk.md).
|
|
|
|
## Phase 5.6: Artifact Design Profiling
|
|
|
|
Before approving generated reports or visual outputs, define how the artifact should read and scan.
|
|
|
|
- choose the artifact family: tutorial, report, review viewer, dashboard, visual evidence, slide-like narrative, or code/CLI guide
|
|
- let the content choose the visual system instead of copying a fixed house style
|
|
- borrow document discipline from Kami: route by document type, distill content, verify layout-critical assumptions
|
|
- borrow slide discipline from presentation skills: plan hierarchy, density, rhythm, and quality gates before writing HTML
|
|
- reject generic headings, noisy citations, weak tables, wrong screenshots, repeated card grids, and decorative visual defaults
|
|
|
|
Generate `reports/artifact-design-profile.md` and expose it in the overview and review viewer.
|
|
|
|
See [Artifact Design Doctrine](artifact-design-doctrine.md) and [Output Visual Quality](output-visual-quality.md).
|
|
|
|
## Phase 5.7: Prompt Quality Profiling
|
|
|
|
When a skill depends on prompt behavior, role design, dialogue quality, or output contracts, turn prompt methodology into an evidence profile.
|
|
|
|
- identify the explicit need, implicit need, scenario, user level, and success standard
|
|
- map Role, Task, and Format into skill behavior instead of copying a full meta-prompt into `SKILL.md`
|
|
- classify the task family and complexity before adding structure
|
|
- score completeness, clarity, consistency, practicality, and specificity
|
|
- expose the full reasoning to reviewers while keeping the user-facing flow recommendation-led
|
|
|
|
Generate `reports/prompt-quality-profile.md` and expose it in the overview and review viewer.
|
|
|
|
See [Prompt Engineering Doctrine](prompt-engineering-doctrine.md).
|
|
|
|
## Phase 6: Trigger-First Authoring
|
|
|
|
Author the frontmatter `description` before expanding the body.
|
|
|
|
- start with the recurring job
|
|
- include the trigger actions that should route here
|
|
- include exclusions when confusion is plausible
|
|
- test the route before growing the file tree
|
|
|
|
Trigger quality is improved through:
|
|
|
|
- `trigger_eval.py`
|
|
- `optimize_description.py`
|
|
- blind holdout
|
|
- judge-backed blind holdout
|
|
- adversarial holdout
|
|
- route confusion
|
|
|
|
## Phase 7: Gate Selection
|
|
|
|
Add gates by risk, not by habit.
|
|
|
|
- low-risk scaffolds: validate structure and context size
|
|
- production skills: trigger eval plus resource-boundary checks
|
|
- library skills: description optimization, route confusion, packaging checks
|
|
- governed skills: governance scoring, lifecycle metadata, regression history
|
|
|
|
See [Gate Selection](gate-selection.md).
|
|
|
|
## Phase 8: First Iteration Philosophy
|
|
|
|
The first package is a routeable baseline, not the final answer.
|
|
|
|
- improve trigger and exclusions before growing prose
|
|
- add one execution asset before adding many documents
|
|
- surface the three highest-value next moves so authors do not expand in every direction at once
|
|
- prefer the smallest step that increases reliability more than context cost
|
|
- move unverifiable ideas into next-step candidates instead of shipping them as baseline structure
|
|
|
|
See [Iteration Philosophy](iteration-philosophy.md) and [Authoring Discipline](authoring-discipline.md).
|
|
|
|
## Phase 9: Promotion
|
|
|
|
A candidate route or package is promotable only when:
|
|
|
|
- visible holdout does not regress
|
|
- blind holdout does not regress
|
|
- judge-backed blind holdout does not regress
|
|
- adversarial holdout does not regress
|
|
- route confusion stays clean
|
|
- context and governance gates still pass
|
|
|
|
See [Promotion Policy](../evals/promotion_policy.md).
|
|
|
|
## Design Principle
|
|
|
|
The method is only correct if rigor grows faster than context cost. If a new check or document makes the skill heavier without making it more reliable, remove or relocate it.
|