# Agent Skills Best Practices Community-sourced patterns, techniques, and pitfalls from practitioners and official documentation. ## Table of Contents - [Progressive Disclosure Architecture](#progressive-disclosure-architecture) - [Skill Composition Patterns](#skill-composition-patterns) - [Description Optimization](#description-optimization) - [Common Pitfalls](#common-pitfalls) - [Testing Strategies](#testing-strategies) - [Advanced Techniques](#advanced-techniques) - [Security Considerations](#security-considerations) - [Organization-Wide Patterns](#organization-wide-patterns) ## Progressive Disclosure Architecture **Three-tier information model**: Discovery → Activation → Execution ### Discovery Layer (~50 tokens) YAML frontmatter that helps agents find the right skill without loading full content. ```yaml --- name: pdf-processing description: Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction. --- ``` **Keys to effective discovery**: - Include WHAT the skill does AND WHEN to use it - Use third-person voice - Include specific trigger terms users might mention - Keep under 100 tokens ### Activation Layer (~2-5K tokens) Core SKILL.md instructions loaded when skill is invoked. **Structure**: ```markdown # Skill Name Clear criteria for when this skill applies Step-by-step process (numbered or structured) - ALWAYS: Mandatory behaviors - NEVER: Prohibited actions - PREFER: Recommended approaches Links to deep-dive docs in references/ subdirectory ``` **Keys to effective activation**: - **Assume intelligence**: Claude doesn't need basic concepts explained - **Be directive, not comprehensive**: Focus on what makes THIS approach different - **Keep under 500 lines**: Move details to references/ - **Use examples sparingly**: Only for non-obvious patterns ### Execution Layer (dynamic) Deep-dive content loaded on-demand from references/ subdirectory. **Pattern from practitioners**: ``` skill-name/ ├── SKILL.md # Core workflow (500 lines max) ├── references/ │ ├── configuration.md # Detailed config options │ ├── error-handling.md # Edge cases and recovery │ ├── advanced-patterns.md # Expert techniques │ └── examples.md # Worked examples └── scripts/ # Helper utilities ``` **Why this works** (source: Juan C Olamendy, skillmatic-ai): - Prevents context rot from loading irrelevant information - Allows targeted follow-up ("show me the advanced patterns") - Keeps initial load fast and focused - Scales to complex domains without overwhelming context ## Skill Composition Patterns ### Skills Invoking Skills **Pattern**: Reference other skills in instructions rather than duplicating methodology. ```markdown ## Error Investigation Load the **outfitter:debugging** skill using the Skill tool to investigate this authentication failure systematically. Pass these parameters to the debugging workflow: - Error context: [collected error details] - Hypothesis: Token validation timing issue ``` **Why this works**: - Reuses established methodologies - Maintains single source of truth - Allows skills to evolve independently - Reduces duplication across skill library **Anti-pattern**: Embedding another skill's instructions inline. ### Subagent Architecture For orchestrating specialized work with context isolation, see [claude-code.md](./claude-code.md#master-clone-architecture) for Claude Code-specific patterns. ### Skill + External Service Integration Skills can integrate with external services (APIs, MCP servers) by separating concerns: - **External service**: Handles authentication, rate limiting, data access - **Skill**: Handles business logic, formatting, workflows This separation enables reuse across similar domains. ## Description Optimization **Goal**: Help Claude discover your skill without loading it. ### Include Both WHAT and WHEN ❌ **Vague**: "Processes PDFs" ✅ **Specific**: "Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction." ### Use Third-Person Voice ❌ "Use me when you need to debug" ✅ "Debugs issues using systematic root cause analysis. Use when encountering errors, unexpected behavior, or test failures." ### Include Trigger Terms Think about what users actually say: ```yaml description: Creates weekly team status reports with wins, challenges, and priorities. Use when the user asks for a team update, standup report, weekly summary, or status email. Keywords: standup, weekly update, team report, status. ``` ### Be Specific About Scope ❌ "Helps with testing" ✅ "Implements test-driven development using Red-Green-Refactor cycles. Use when implementing new features with tests first, refactoring with test coverage, or reproducing bugs as failing tests." **Source**: Official Anthropic best practices emphasize specificity prevents Claude from loading irrelevant skills. ## Common Pitfalls ### 1. Making SKILL.md Too Verbose **Symptom**: 1000+ line SKILL.md files with exhaustive explanations. **Why it's a problem**: - Wastes context window on every invocation - Buries key directives in noise - Assumes Claude needs basic concepts explained **Fix**: - Keep SKILL.md under 500 lines - Move deep dives to references/ - Trust Claude's base knowledge - Focus on WHAT makes THIS approach unique **Example** (source: Anthropic best practices): ❌ **Verbose**: ```markdown ## What is Test-Driven Development? Test-Driven Development (TDD) is a software development methodology where you write tests before writing the actual code. This approach was popularized by Kent Beck and has become a cornerstone of modern software engineering practices... [500 lines of TDD philosophy] ``` ✅ **Concise**: ```markdown ## TDD Workflow 1. **Red**: Write a failing test for the next small piece of functionality 2. **Green**: Write minimal code to make the test pass 3. **Refactor**: Improve code while keeping tests green ALWAYS write the test first. NEVER skip the refactor step. ``` ### 2. Negative-Only Constraints **Symptom**: Instructions full of "NEVER do X" without alternatives. ❌ **Problem**: ```markdown - NEVER use any types - NEVER skip error handling - NEVER commit without tests ``` **Why it's a problem**: Tells Claude what NOT to do but not what TO do. ✅ **Fix**: Pair constraints with positive alternatives: ```markdown - ALWAYS use strict types; NEVER use `any` - ALWAYS handle errors with Result types; NEVER let exceptions propagate silently - ALWAYS run tests before committing; NEVER push untested code ``` ### 3. Deeply Nested File References **Symptom**: Skills referencing files that reference other files 3+ levels deep. **Why it's a problem**: - Context explosion - Circular references - Hard to maintain **Fix** (source: skillmatic-ai research): - Keep references ONE level deep - Use table of contents in long reference files - Let Claude request additional detail if needed ❌ **Deep nesting**: ``` SKILL.md → references/patterns.md → references/examples/auth.md → references/examples/auth/jwt.md ``` ✅ **Flat structure**: ``` SKILL.md → references/auth-patterns.md (with ToC for JWT, OAuth, etc.) ``` ### 4. Not Treating Skills Like Code **Symptom**: Skills maintained as loose documents without version control, testing, or reviews. **Why it's a problem**: - Skills drift from reality - Breaking changes go unnoticed - No way to roll back problematic versions **Fix** (source: blog.sshh.io, Nate's newsletter): - **Version control**: Skills in git repos with semantic versioning - **Testing**: Build evaluations to validate skill behavior - **Reviews**: Treat skill PRs like code reviews - **Changelog**: Document what changed and why **Pattern from practitioners**: ```markdown --- name: api-integration version: 2.1.0 changelog: | 2.1.0 - Added retry logic for rate limiting 2.0.0 - Switched to streaming responses (breaking) 1.5.0 - Added webhook verification --- ``` ### 5. Over-Relying on Auto-Compaction **Symptom**: Never manually clearing context, letting auto-compaction handle everything. **Why it's a problem** (source: blog.sshh.io practitioner experience): - Important context gets compressed or dropped - Skill instructions get summarized incorrectly - Debugging becomes harder when full skill isn't visible **Fix**: Manual context management strategy: 1. Start complex tasks with `/clear` for clean slate 2. Use `/catchup` with explicit context about what skills are active 3. Let auto-compaction handle routine continuations 4. Force reload skills after compaction if behavior seems off **When to manually clear**: - Starting new major feature - Switching between unrelated tasks - After hitting context limits on complex debugging - When skill behavior seems inconsistent ### 6. Unclear Skill Boundaries **Symptom**: Skill tries to do too many unrelated things. **Example**: "code-helper" that does linting, testing, documentation, deployment, and debugging. **Why it's a problem**: - Hard to discover (description too generic) - Loads unnecessary context - Becomes maintenance nightmare **Fix**: **One skill, one job**. ✅ **Well-scoped skills**: - `linting-workflow`: Code quality checks and fixes - `tdd`: TDD methodology - `api-documentation`: API reference generation - `deployment-automation`: Deploy and rollback workflows - `debugging`: Root cause investigation **Exception**: Orchestrator skills that explicitly load other skills (like `feature-development` that loads TDD → documentation → deployment in sequence). ### 7. No Usage Examples **Symptom**: Skill has abstract instructions but no concrete examples. **Why it's a problem**: Claude may misinterpret intent without seeing desired output. **Fix**: Include 1-2 examples in references/examples.md **Pattern**: ```markdown # Examples ## Example 1: Simple Case **Input**: User asks to add login endpoint **Workflow**: 1. Load TDD skill 2. Write failing test for /login POST 3. Implement minimal auth logic 4. Refactor to service layer **Output**: [Show actual test code + implementation] ## Example 2: Edge Case **Input**: User asks to add login with OAuth and JWT and refresh tokens **Workflow**: 1. Load pathfinding skill to break down requirements 2. Load TDD skill for each component separately 3. OAuth integration → JWT generation → Refresh logic 4. Each gets its own test cycle **Output**: [Show breakdown and test structure] ``` **Source**: Official Anthropic best practices recommend examples for non-obvious patterns. ## Testing Strategies ### Eval-Driven Development **Pattern**: Build evaluations BEFORE extensive documentation (source: Nate's newsletter). **Workflow**: 1. Create minimal skill version 2. Build test suite with target inputs/outputs 3. Iterate skill until evals pass consistently 4. THEN write comprehensive docs **Why this works**: - Prevents documenting the wrong approach - Faster iteration cycles - Forces clarity about success criteria - Builds regression test suite automatically **Implementation** (from Nate's debugging toolkit): ```typescript // skill-testing-framework pattern interface SkillEval { name: string; input: string; expectedBehavior: string[]; forbiddenBehavior: string[]; targetModels: ('haiku' | 'sonnet' | 'opus')[]; } const tddSkillEvals: SkillEval[] = [ { name: "basic-tdd-workflow", input: "Add a login endpoint", expectedBehavior: [ "Writes test first", "Test fails initially (red phase)", "Implements minimal solution", "Test passes (green phase)", "Refactors with tests passing" ], forbiddenBehavior: [ "Writes implementation before test", "Skips refactor step", "Makes test pass by modifying test" ], targetModels: ['haiku', 'sonnet', 'opus'] } ]; ``` ### Multi-Model Testing **Pattern**: Test skills with all target models. **Why**: Haiku, Sonnet, and Opus interpret instructions differently: - **Haiku**: Needs more explicit instructions, less inference - **Sonnet**: Balanced reasoning, good for most workflows - **Opus**: Handles complex context, better with ambiguity **Testing strategy** (source: Anthropic best practices): | Aspect | Haiku Test | Sonnet Test | Opus Test | |--------|------------|-------------|-----------| | Clarity | Do instructions work with minimal reasoning? | Do instructions balance brevity and clarity? | Do instructions leverage advanced reasoning? | | Context | Works with small context? | Handles moderate references? | Manages large cross-references? | | Edge cases | Explicit handling? | Reasonable inference? | Sophisticated judgment? | **Fix pattern**: If Haiku fails but Sonnet passes, instructions likely assume too much inference. ### Real-World Usage Testing **Pattern**: Test skills with actual users/agents in production-like scenarios. **Anti-pattern**: Only testing with constructed examples. **Strategy** (from practitioner experience): 1. **Dogfooding**: Use your own skills for real work 2. **Iteration tracking**: Log when skills are loaded but not followed 3. **Confusion signals**: Detect when Claude asks for clarification (skill might be unclear) 4. **Outcome validation**: Did the skill achieve its intended result? **Metrics to track**: - Skill load frequency (is it discoverable?) - Completion rate (do workflows finish?) - User satisfaction (did it solve the problem?) - Iteration count (how many tries to get it right?) **From blog.sshh.io**: "Built 10 debugging tools after watching 100 people hit the same problems in their first week." ### Systematic Evaluation Framework **Components** (source: Nate's newsletter, skillmatic-ai research): 1. **skill-debugging-assistant**: Identifies where skills fail 2. **skill-security-analyzer**: Checks for security risks in skill code 3. **skill-gap-analyzer**: Finds missing skills in your library 4. **skill-performance-profiler**: Tracks context usage and latency 5. **prompt-optimization-analyzer**: Improves skill descriptions for discovery 6. **skill-testing-framework**: Automated test runner for skills **Pattern**: Build tools to test tools. ## Advanced Techniques ### Hook-Based Validation For platform-specific hook implementation patterns, see [claude-code.md](./claude-code.md#hook-based-validation). **General principle**: Use hooks to enforce constraints at decision points—prevent destructive operations, enforce testing requirements, validate configuration before deployment. ### Organization-Wide Skill Libraries **Pattern**: Centralized skill repository as institutional knowledge (source: Juan C Olamendy, Medium). **Structure**: ``` company-skills/ ├── engineering/ │ ├── deployment-workflow/ │ ├── incident-response/ │ └── architecture-review/ ├── product/ │ ├── user-story-creation/ │ └── feature-planning/ └── business/ ├── team-standup/ └── quarterly-planning/ ``` **Benefits**: - Codifies company processes - Onboarding material becomes executable - Process improvements propagate automatically - Consistency across teams **Implementation** (from practitioners): 1. **Central registry**: Marketplace or internal skill server 2. **Contribution guidelines**: Templates for creating company skills 3. **Review process**: Skills reviewed like code before publishing 4. **Version management**: Semantic versioning for breaking changes 5. **Deprecation policy**: How to sunset old patterns **Pattern from blog.sshh.io**: ```markdown # Company Skill Manifest ## Deployment - `deployment-staging`: Deploy to staging with rollback plan - `deployment-production`: Production deploy with checklist - `deployment-rollback`: Emergency rollback procedures ## Code Review - `pr-review-backend`: Backend code review checklist - `pr-review-frontend`: Frontend code review standards - `security-review`: Security-focused code review ## Documentation - `api-documentation`: OpenAPI spec generation - `readme-maintenance`: README updates for features ``` **Anti-pattern**: Every team building their own version of the same workflows. ### Progressive Skill Disclosure in Practice **Advanced pattern**: Table of contents in reference files for targeted loading. **Example** (source: skillmatic-ai architecture): ```markdown # API Integration Patterns ## Table of Contents - [REST Basics](#rest-basics) - Standard CRUD operations - [GraphQL](#graphql) - Query and mutation patterns - [Webhooks](#webhooks) - Event-driven integrations - [Rate Limiting](#rate-limiting) - Backoff and retry - [Authentication](#authentication) - OAuth, JWT, API keys - [Error Handling](#error-handling) - Retry logic and fallbacks ## REST Basics [Focused content on REST] ## GraphQL [Focused content on GraphQL] ``` **Usage**: Skill says "See references/api-patterns.md#rate-limiting for retry logic" rather than loading entire file. **Why it works**: - Claude can navigate to specific section - Preserves context for other tasks - User can request more depth if needed ### Skills as Living Documentation **Pattern**: Skills replace static documentation that goes stale. **Traditional docs**: "Here's how to deploy" (written once, outdated quickly) **Skill**: Executes deployment with current best practices **Benefits** (source: Juan C Olamendy): - **Always current**: If process changes, skill changes - **Executable**: Not just instructions but enforcement - **Testable**: Verify the process actually works - **Discoverable**: Claude can find relevant process **Example transformation**: ❌ **Static doc** (docs/deployment.md): ```markdown # Deployment Process 1. Run tests 2. Update version number 3. Build production bundle 4. Upload to S3 5. Clear CDN cache 6. Notify team in Slack [This gets outdated when we switch to Vercel] ``` ✅ **Skill** (skills/deployment/SKILL.md): ```markdown --- name: deployment-production description: Deploys to production with safety checks --- # Production Deployment 1. Verify all tests pass: `bun test` 2. Run build: `bun run build` 3. Deploy to Vercel: `vercel --prod` 4. Verify deployment: Check /api/health 5. Notify team: Use Slack MCP to post to #deployments ALWAYS wait for health check before considering deploy complete. ``` **When process changes**: Update skill, test it, deploy new version. Documentation stays current. ### Skill Chaining for Complex Workflows **Pattern**: Master skill orchestrates sequence of specialized skills. **Example** (source: practitioner patterns): ```markdown --- name: feature-development description: End-to-end feature development workflow --- # Feature Development Workflow ## Stage 1: Planning Load **pathfinding** skill to clarify requirements and architecture. ## Stage 2: Implementation Load **tdd** skill to implement with tests. ## Stage 3: Documentation Load **api-documentation** skill to generate API docs. ## Stage 4: Review Load **code-review** skill to validate implementation. ## Stage 5: Deployment Load **deployment-staging** skill to deploy for testing. Each stage must complete successfully before proceeding to next. ``` **Advantage**: Each specialized skill can evolve independently. Feature-development orchestrates but doesn't duplicate. **Related pattern - Conditional chaining**: ```markdown ## Error Recovery If tests fail in Stage 2: Load **debugging** skill to investigate Return to Stage 2 after fixes If code review finds issues in Stage 4: Return to Stage 2 for fixes Re-run Stage 3 to update docs Re-run Stage 4 to re-review ``` ## Security Considerations **Critical warning** (source: Sid Bharath tutorial, security research): Skills can execute arbitrary code and access files. Only use skills from trusted sources. ### Risks 1. **Code execution**: Skills can include scripts that run on your machine 2. **File access**: Skills can read/write files in project 3. **Network access**: Skills can make HTTP requests 4. **Credential access**: Skills can access environment variables, config files 5. **Social engineering**: Malicious skills disguised as helpful tools ### Protection Strategies **1. Source verification**: - Only install skills from trusted authors - Review skill code before using - Check community reputation and reviews - Verify skill matches description (no hidden behavior) **2. Code review checklist** (from security research): ```markdown ## Skill Security Review - [ ] Review all scripts in scripts/ directory - [ ] Check for file system access patterns - [ ] Verify network requests are legitimate - [ ] Confirm no credential harvesting - [ ] Check for obfuscated code - [ ] Validate external dependencies - [ ] Test in isolated environment first ``` **3. Sandbox testing**: - Test new skills in isolated project first - Use throwaway credentials for initial testing - Monitor file system and network activity - Check for unexpected side effects **4. Minimal permissions**: ```yaml # Proposed security metadata (from research) permissions: file_read: ['src/**', 'docs/**'] file_write: ['docs/**'] network: ['https://api.company.com'] environment: [] ``` **5. Audit logging**: Track what skills do in production: - What files were accessed? - What commands were executed? - What network requests were made? **From security papers**: "Skills are code execution with conversational interface. Treat them with same security rigor as any code dependency." ## Organization-Wide Patterns ### Skill as Institutional Knowledge **Pattern**: Replace tribal knowledge with executable skills (source: Juan C Olamendy). **Traditional problem**: - "How do we deploy?" → Ask Sarah, she knows - "What's the PR review process?" → Different on every team - "How do we handle incidents?" → Check the wiki (outdated) **Skill solution**: - **deployment-production skill**: Encodes Sarah's knowledge - **pr-review skill**: Standardizes review process - **incident-response skill**: Current playbook, always up to date **Implementation strategy**: 1. **Identify critical workflows**: What knowledge is locked in people's heads? 2. **Interview experts**: How do they actually do the work? 3. **Create skills**: Encode process as executable workflow 4. **Test with novices**: Can someone unfamiliar complete the task? 5. **Iterate**: Refine based on real usage 6. **Deprecate docs**: Point to skills instead of wikis **Example from blog.sshh.io**: ```markdown --- name: internal-deploy description: Company deployment process with all safety checks --- # Internal Deployment Workflow ## Pre-Deploy Checklist 1. Verify Jira ticket is in "Ready for Deploy" status 2. Confirm tests pass in CI: `check-ci-status` 3. Get approval in #deploy-requests Slack channel ## Deploy 1. Run staging deploy: `npm run deploy:staging` 2. Verify staging health: `curl https://staging.company.com/health` 3. Run smoke tests: `npm run smoke-test:staging` 4. Deploy to production: `npm run deploy:prod` 5. Monitor for 5 minutes: Watch Datadog dashboard ## Post-Deploy 1. Verify production health: `curl https://company.com/health` 2. Post to #deployments: "Deployed [feature] to prod" 3. Update Jira ticket to "Deployed" NEVER skip smoke tests. ALWAYS monitor after deploy. ``` **Benefit**: New team members can deploy safely on day one. ### Contribution Guidelines **Pattern**: Treat skills like open source contributions. **Template** (from ComposioHQ awesome-claude-skills): ```markdown # Contributing Skills ## Before Submitting 1. **Test thoroughly**: Run skill with Haiku, Sonnet, and Opus 2. **Follow structure**: Use provided skill template 3. **Document clearly**: Include description, when to use, examples 4. **Security review**: No malicious code or credential access 5. **License**: MIT or Apache 2.0 ## Skill Requirements - [ ] Descriptive name (kebab-case) - [ ] Clear description with trigger terms - [ ] SKILL.md under 500 lines - [ ] References in references/ subdirectory - [ ] At least one example in examples/ - [ ] Testing results documented - [ ] README.md with usage instructions ## Review Process 1. Submit PR with skill in skills/your-skill-name/ 2. Maintainers review for quality and security 3. Address feedback 4. Approved skills merged and published ``` ### Versioning Strategy **Pattern**: Semantic versioning for skills (from practitioners). **Format**: MAJOR.MINOR.PATCH ```yaml --- name: api-integration version: 2.1.0 --- ``` **Versioning rules**: - **MAJOR**: Breaking changes (workflow steps changed, different inputs required) - **MINOR**: New features (additional optional steps, new references added) - **PATCH**: Bug fixes (typos, clarifications, small improvements) **Breaking change example**: ```markdown # Version 1.x: Required user to provide API key --- name: api-client version: 1.5.0 description: Make API calls with provided credentials --- # Version 2.x: Uses MCP server for authentication (breaking) --- name: api-client version: 2.0.0 description: Make API calls using Linear MCP server --- ``` **Migration guide pattern**: ```markdown # Migration Guide: 1.x → 2.0 ## Breaking Changes - No longer accepts `api_key` parameter - Now requires Linear MCP server configured - Response format changed from JSON to structured objects ## Migration Steps 1. Install Linear MCP server: `/mcp install linear` 2. Update skill invocations to remove `api_key` 3. Update code expecting JSON to handle structured objects ``` ## Summary: Hierarchy of Best Practices ### Essential (Do These Always) 1. **Progressive disclosure**: Keep SKILL.md under 500 lines, use references/ 2. **Clear descriptions**: Include what AND when, with trigger terms 3. **Assume intelligence**: Claude doesn't need basics explained 4. **Test with real usage**: Dogfood your own skills 5. **Version control**: Track changes, review like code ### Important (Do These Usually) 6. **Multi-model testing**: Verify Haiku, Sonnet, Opus behavior 7. **Positive constraints**: Say what TO do, not just what NOT to do 8. **Examples for non-obvious**: Show expected behavior 9. **Composition over duplication**: Reference other skills 10. **Security review**: Audit code execution and file access ### Advanced (Do These for Scale) 11. **Eval-driven development**: Build tests before extensive docs 12. **Hook-based enforcement**: Use PreToolUse for quality gates 13. **Organization-wide libraries**: Centralized skill registry 14. **Semantic versioning**: Track breaking changes 15. **Skills as living docs**: Replace static documentation ### Expert (Do These for Excellence) 16. **Systematic evaluation framework**: Build tools to test tools 17. **Master-Clone architecture**: Optimize context usage 18. **Conditional skill chaining**: Orchestrate complex workflows 19. **Audit logging**: Track skill execution in production 20. **Community contribution**: Share patterns, learn from others ## Sources Research synthesized from: - **Official Documentation**: Anthropic Claude Agent Skills Best Practices - **Community Repositories**: ComposioHQ/awesome-claude-skills, skillmatic-ai/awesome-agent-skills - **Practitioner Blogs**: blog.sshh.io (Claude Code at scale), Juan C Olamendy (Medium), Sid Bharath - **Research**: Security considerations from academic papers, progressive disclosure architecture - **Tooling**: Nate's Newsletter (debugging toolkit), evaluation frameworks Last updated: 2026-01-10