playbook/outfitter-agents/plugins/cli-dev/skills/cli-development-guidelines/references/EVAL_PROMPTS.md

2.6 KiB

Evaluation Prompts for This Skill

Use these to verify an agent is applying CLI best practices (not just "making something that runs").

Scoring rubric

  • Pass:
    • Produces a clear CLI contract (commands, flags, IO, exit codes, examples).
    • Explicitly addresses stdout/stderr, exit codes, help behavior, and interactivity.
    • Includes stable scripting output modes (--json/--plain) when relevant.
    • Avoids secret leaks (no secrets via flags/env).
  • Strong pass:
    • Uses the checklist and/or the audit script.
    • Highlights trade-offs and backwards-compatibility risks.
    • Provides ready-to-ship help output and error message patterns.

Prompt: design a new CLI

  • Task:
    • "Design a CLI called logship that tails logs from multiple sources (local files and HTTP endpoints), filters by regex, and outputs either human-friendly colored logs or machine-readable JSON."
  • Must include:
    • Subcommands or flags decision (and rationale)
    • stdout vs stderr behavior
    • --json output definition (shape)
    • Color behavior (NO_COLOR, --no-color, TTY detection)
    • Timeouts for HTTP, progress/status messages
    • Example-first help outline
    • Exit codes

Prompt: review a flawed CLI help output

  • Task:
    • "Here's the current --help output for acmectl. It's 200 lines of flags, no examples, and no description. Rewrite it to be discoverable."
  • Must include:
    • Concise default help vs full help structure
    • Examples near the top
    • Group common flags first
    • Support path / docs link

Prompt: fix stdout/stderr separation

  • Task:
    • "This command prints progress bars to stdout and the JSON result to stderr. Fix the output contract."
  • Must include:
    • Machine output on stdout
    • Human/progress on stderr
    • Behavior when piped/captured (no animations)

Prompt: safe destructive action

  • Task:
    • "Add a delete command that can delete remote projects. Make it safe for humans but scriptable."
  • Must include:
    • Confirmation levels (moderate vs severe)
    • --dry-run
    • --force and/or --confirm="exact-name"
    • --no-input behavior

Prompt: secret handling

  • Task:
    • "Add auth to the CLI. It currently accepts --token <secret> and reads MYAPP_TOKEN env var. Fix the design."
  • Must include:
    • --token-file and/or --token-stdin
    • Recommendation for OS keychain / secret manager
    • Explain why flags/env are unsafe

Prompt: run the audit script

  • Task:
    • "Run scripts/cli_audit.py against ./mycli and address the FAIL/WARN items."
  • Must include:
    • Interpreting the audit output
    • Fixing highest severity issues first
    • Updating help text and/or flags accordingly