60 lines
1.9 KiB
Markdown
60 lines
1.9 KiB
Markdown
---
|
||
name: root-cause-tracing
|
||
description:
|
||
"Root cause analysis (RCA) and tracing failures back to the original trigger
|
||
across layers. Triggers: root cause, RCA, tracing, 回溯, 根因, 追溯,
|
||
为什么会发生."
|
||
---
|
||
|
||
# Root Cause Tracing(根因溯源 / RCA)
|
||
|
||
## When to Use
|
||
|
||
- Incidents, regressions, flaky tests, recurring bugs
|
||
- “Fix the symptom” patches where the underlying trigger is unknown
|
||
- Multi-layer failures (client → service → DB → async jobs)
|
||
|
||
## Inputs(required)
|
||
|
||
- Evidence: logs, stack traces, metrics, failing test output
|
||
- Timeline: when it started, what changed, rollout events
|
||
- Scope: affected users/paths, frequency, severity
|
||
- Verification: how to reproduce (or how to detect reliably)
|
||
|
||
## Procedure(default)
|
||
|
||
1. **Frame the Failure**
|
||
- Define expected vs actual behavior
|
||
- Identify the earliest known bad signal
|
||
|
||
2. **Trace Backwards**
|
||
- Walk back through layers: surface error → caller → upstream trigger
|
||
- Look for the first point where invariants were violated
|
||
|
||
3. **Find the Trigger**
|
||
- What input/state/sequence causes it?
|
||
- What changed around that area (code/config/deps/data)?
|
||
|
||
4. **Fix at the Right Layer**
|
||
- Prefer root-cause fix + defense-in-depth guardrails
|
||
- Add regression test or a deterministic repro harness
|
||
|
||
5. **Validate**
|
||
- Reproduce before fix; verify after fix
|
||
- Add monitoring/alerts if appropriate
|
||
|
||
## Output Contract(stable)
|
||
|
||
- Summary: what broke and impact
|
||
- Root cause: the earliest causal violation + why it happened
|
||
- Trigger: minimal repro steps / conditions
|
||
- Fix: what changed and why it prevents recurrence
|
||
- Verification: tests/commands + evidence
|
||
- Follow-ups: guardrails/observability/rollout notes
|
||
|
||
## Guardrails
|
||
|
||
- Don’t stop at “where it crashed”; find “why the bad state existed”
|
||
- Separate contributing factors vs root cause
|
||
- Avoid speculative RCA; label assumptions and request missing evidence
|