Evaluation Prompts for This Skill

Use these to verify an agent is applying CLI best practices (not just "making something that runs").

Scoring rubric

Pass:
- Produces a clear CLI contract (commands, flags, IO, exit codes, examples).
- Explicitly addresses stdout/stderr, exit codes, help behavior, and interactivity.
- Includes stable scripting output modes (--json/--plain) when relevant.
- Avoids secret leaks (no secrets via flags/env).
Strong pass:
- Uses the checklist and/or the audit script.
- Highlights trade-offs and backwards-compatibility risks.
- Provides ready-to-ship help output and error message patterns.

Task:
- "Design a CLI called logship that tails logs from multiple sources (local files and HTTP endpoints), filters by regex, and outputs either human-friendly colored logs or machine-readable JSON."
Must include:
- Subcommands or flags decision (and rationale)
- stdout vs stderr behavior
- --json output definition (shape)
- Color behavior (NO_COLOR, --no-color, TTY detection)
- Timeouts for HTTP, progress/status messages
- Example-first help outline
- Exit codes

Task:
- "Here's the current --help output for acmectl. It's 200 lines of flags, no examples, and no description. Rewrite it to be discoverable."
Must include:
- Concise default help vs full help structure
- Examples near the top
- Group common flags first
- Support path / docs link

Task:
- "This command prints progress bars to stdout and the JSON result to stderr. Fix the output contract."
Must include:
- Machine output on stdout
- Human/progress on stderr
- Behavior when piped/captured (no animations)

Task:
- "Add a delete command that can delete remote projects. Make it safe for humans but scriptable."
Must include:
- Confirmation levels (moderate vs severe)
- --dry-run
- --force and/or --confirm="exact-name"
- --no-input behavior

Task:
- "Add auth to the CLI. It currently accepts --token <secret> and reads MYAPP_TOKEN env var. Fix the design."
Must include:
- --token-file and/or --token-stdin
- Recommendation for OS keychain / secret manager
- Explain why flags/env are unsafe

Task:
- "Run scripts/cli_audit.py against ./mycli and address the FAIL/WARN items."
Must include:
- Interpreting the audit output
- Fixing highest severity issues first
- Updating help text and/or flags accordingly