playbook/codex/skills/root-cause-tracing/SKILL.md

60 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: root-cause-tracing
description:
"Root cause analysis (RCA) and tracing failures back to the original trigger
across layers. Triggers: root cause, RCA, tracing, 回溯, 根因, 追溯,
为什么会发生."
---
# Root Cause Tracing根因溯源 / RCA
## When to Use
- Incidents, regressions, flaky tests, recurring bugs
- “Fix the symptom” patches where the underlying trigger is unknown
- Multi-layer failures (client → service → DB → async jobs)
## Inputsrequired
- Evidence: logs, stack traces, metrics, failing test output
- Timeline: when it started, what changed, rollout events
- Scope: affected users/paths, frequency, severity
- Verification: how to reproduce (or how to detect reliably)
## Proceduredefault
1. **Frame the Failure**
- Define expected vs actual behavior
- Identify the earliest known bad signal
2. **Trace Backwards**
- Walk back through layers: surface error → caller → upstream trigger
- Look for the first point where invariants were violated
3. **Find the Trigger**
- What input/state/sequence causes it?
- What changed around that area (code/config/deps/data)?
4. **Fix at the Right Layer**
- Prefer root-cause fix + defense-in-depth guardrails
- Add regression test or a deterministic repro harness
5. **Validate**
- Reproduce before fix; verify after fix
- Add monitoring/alerts if appropriate
## Output Contractstable
- Summary: what broke and impact
- Root cause: the earliest causal violation + why it happened
- Trigger: minimal repro steps / conditions
- Fix: what changed and why it prevents recurrence
- Verification: tests/commands + evidence
- Follow-ups: guardrails/observability/rollout notes
## Guardrails
- Dont stop at “where it crashed”; find “why the bad state existed”
- Separate contributing factors vs root cause
- Avoid speculative RCA; label assumptions and request missing evidence