9.0 KiB

Raw Blame History

Skill Engineering Method

This doctrine defines the default method for turning messy workflow material into a reusable skill without bloating the entrypoint.

Core Loop

Decide whether the request should become a skill at all.
Run a short intent dialogue to capture the real job, outputs, exclusions, and constraints.
Choose the smallest viable archetype.
Set one clear capability boundary.
Write and test the trigger description before expanding the body.
Apply authoring discipline: name unresolved assumptions, keep scope small, and tie meaningful changes to checks.
Add an output risk profile for user-facing artifacts.
Add an artifact design profile for reports, tutorials, viewers, dashboards, screenshots, and visual pages.
Add only the gates that match the risk.
Ship the first routeable package, then pick the three highest-value next iteration directions.
Package and govern the skill only as far as real reuse demands.

Phase 1: Qualification

Promote a request into a skill only when at least one of these is true:

the workflow will be reused
the workflow is easy to route incorrectly
deterministic scripts reduce repeated effort
governance or portability matters

Reject skill creation when the request is only:

explanation
summary
translation
brainstorming
documentation without agent execution
a one-off answer with no reuse value

See Non-Skill Decision Tree.

Phase 1.5: Authoring Discipline

Before expanding the package, apply the execution discipline that keeps the work grounded.

clarify only the assumptions that change the package design
do not add speculative features, generic configurability, or decorative structure
when editing an existing skill, touch only files that directly serve the requested change
connect each meaningful change to a check: route evidence, sample run, resource-boundary check, governance check, or reviewer note

See Authoring Discipline.

Phase 1.6: Problem Diagnosis

When the user gives a fuzzy pain point instead of a clear skill request, diagnose the likely package shape before asking for structure.

infer whether the need is best served by a scaffold, production workflow, library capability, governed asset, or no skill
recommend at most three candidate directions
explain why each candidate fits, where it is limited, and what the first light version should prove
prefer a concrete recommendation over a menu when the intent is clear enough

This keeps discovery useful for users who can describe the problem but not the skill architecture.

Phase 2: Intent Dialogue

Before deep authoring, ask only the questions that change the package design.

open with a human, teacher-like framing rather than a cold field list
let the user answer naturally first; offer a tiny template only as an optional shortcut
what recurring job should the skill own
what real inputs will users hand to it
what outputs must it produce
what near-neighbor requests should stay out of scope
whether the user has reference systems, repos, or products worth learning from
what constraints matter: privacy, naming, portability, governance, or local fit

See Intent Dialogue.

Phase 3: Archetype Selection

Choose the lightest archetype that fits the job.

Scaffold: exploratory, personal, or short-lived
Production: team-reused, quality-sensitive, but still compact
Library: broad reuse, visible evidence, portability, and maintenance expectations
Governed: organizationally sensitive or operationally critical; lifecycle and review are explicit

See Skill Archetypes.

Phase 4: Boundary Design

Every skill should answer four questions clearly:

what recurring job does it own
what outputs does it produce
what near-neighbor requests should not route here
what detail belongs outside SKILL.md

Boundary work comes before polishing prose.

Phase 5: Reference Scan

Run a short benchmark pass before deep authoring.

scan 3-5 reference objects at most
prioritize high-star external GitHub and official benchmark sources first
ask for user-supplied references second, but extract only patterns and standards
use local files third, only for fit, privacy, and compatibility calibration
choose from method, structure, execution, portability, and domain patterns
extract only what passes the pattern test: recurrence, generativity, distinctiveness, and boundary clarity
record what not to borrow so the new skill stays light

See Reference Scan Strategy and Pattern Extraction Doctrine.

Phase 5.5: Output Risk Profiling

Before treating the package as usable, predict the likely mistakes in its final user-facing output.

tutorial skills should guard against generic headings, vague examples, and missing success checks
report and Markdown skills should guard against weak tables, dense lists, and poor hierarchy
screenshot or visual skills should guard against wrong captures, missing assets, and invented visual evidence
research or citation-heavy skills should guard against footnote clutter and unsupported claims
code or command skills should guard against hidden cwd, input, output, and side-effect assumptions

Generate reports/output-risk-profile.md and expose it to the reviewer before adding more structure.

See Output Quality Risk.

Phase 5.6: Artifact Design Profiling

Before approving generated reports or visual outputs, define how the artifact should read and scan.

choose the artifact family: tutorial, report, review viewer, dashboard, visual evidence, slide-like narrative, or code/CLI guide
let the content choose the visual system instead of copying a fixed house style
borrow document discipline from Kami: route by document type, distill content, verify layout-critical assumptions
borrow slide discipline from presentation skills: plan hierarchy, density, rhythm, and quality gates before writing HTML
reject generic headings, noisy citations, weak tables, wrong screenshots, repeated card grids, and decorative visual defaults

Generate reports/artifact-design-profile.md and expose it in the overview and review viewer.

See Artifact Design Doctrine and Output Visual Quality.

Phase 5.7: Prompt Quality Profiling

When a skill depends on prompt behavior, role design, dialogue quality, or output contracts, turn prompt methodology into an evidence profile.

identify the explicit need, implicit need, scenario, user level, and success standard
map Role, Task, and Format into skill behavior instead of copying a full meta-prompt into SKILL.md
classify the task family and complexity before adding structure
score completeness, clarity, consistency, practicality, and specificity
expose the full reasoning to reviewers while keeping the user-facing flow recommendation-led

Generate reports/prompt-quality-profile.md and expose it in the overview and review viewer.

See Prompt Engineering Doctrine.

Phase 6: Trigger-First Authoring

Author the frontmatter description before expanding the body.

start with the recurring job
include the trigger actions that should route here
include exclusions when confusion is plausible
test the route before growing the file tree

Trigger quality is improved through:

trigger_eval.py
optimize_description.py
blind holdout
judge-backed blind holdout
adversarial holdout
route confusion

Phase 7: Gate Selection

Add gates by risk, not by habit.

low-risk scaffolds: validate structure and context size
production skills: trigger eval plus resource-boundary checks
library skills: description optimization, route confusion, packaging checks
governed skills: governance scoring, lifecycle metadata, regression history

See Gate Selection.

Phase 8: First Iteration Philosophy

The first package is a routeable baseline, not the final answer.

improve trigger and exclusions before growing prose
add one execution asset before adding many documents
surface the three highest-value next moves so authors do not expand in every direction at once
prefer the smallest step that increases reliability more than context cost
move unverifiable ideas into next-step candidates instead of shipping them as baseline structure

See Iteration Philosophy and Authoring Discipline.

Phase 9: Promotion

A candidate route or package is promotable only when:

visible holdout does not regress
blind holdout does not regress
judge-backed blind holdout does not regress
adversarial holdout does not regress
route confusion stays clean
context and governance gates still pass

See Promotion Policy.

Design Principle

The method is only correct if rigor grows faster than context cost. If a new check or document makes the skill heavier without making it more reliable, remove or relocate it.

9.0 KiB Raw Blame History