1.9 KiB

Raw Blame History

name: root-cause-tracing description: Root cause analysis (RCA) and tracing failures back to the original trigger across layers. Triggers: root cause, RCA, tracing, 回溯, 根因, 追溯, 为什么会发生.

Root Cause Tracing（根因溯源 / RCA）

When to Use

Incidents, regressions, flaky tests, recurring bugs
“Fix the symptom” patches where the underlying trigger is unknown
Multi-layer failures (client → service → DB → async jobs)

Inputs（required）

Evidence: logs, stack traces, metrics, failing test output
Timeline: when it started, what changed, rollout events
Scope: affected users/paths, frequency, severity
Verification: how to reproduce (or how to detect reliably)

Procedure（default）

Frame the Failure
- Define expected vs actual behavior
- Identify the earliest known bad signal
Trace Backwards
- Walk back through layers: surface error → caller → upstream trigger
- Look for the first point where invariants were violated
Find the Trigger
- What input/state/sequence causes it?
- What changed around that area (code/config/deps/data)?
Fix at the Right Layer
- Prefer root-cause fix + defense-in-depth guardrails
- Add regression test or a deterministic repro harness
Validate
- Reproduce before fix; verify after fix
- Add monitoring/alerts if appropriate

Output Contract（stable）

Summary: what broke and impact
Root cause: the earliest causal violation + why it happened
Trigger: minimal repro steps / conditions
Fix: what changed and why it prevents recurrence
Verification: tests/commands + evidence
Follow-ups: guardrails/observability/rollout notes

Guardrails

Don’t stop at “where it crashed”; find “why the bad state existed”
Separate contributing factors vs root cause
Avoid speculative RCA; label assumptions and request missing evidence

1.9 KiB Raw Blame History Unescape Escape

Root Cause Tracing（根因溯源 / RCA）

When to Use

Inputs（required）

Procedure（default）

Output Contract（stable）

Guardrails

1.9 KiB

Raw Blame History