4.5 KiB
4.5 KiB
Documentation Templates
Templates for investigation logging and root cause reports.
Investigation Log
Real-time documentation during investigation.
Format
[TIMESTAMP] STAGE: Action → Result
[10:15] DISCOVERY: Gathered error logs → Found NullPointerException in UserService
[10:22] HYPOTHESIS: Suspect user object not initialized when accessed
[10:28] TEST: Added null check logging → Confirmed user is null
[10:35] EVIDENCE: Traced call path → findById returning null for valid ID
[10:42] HYPOTHESIS: Database connection issue
[10:48] TEST: Direct DB query → Returns data correctly
[10:52] HYPOTHESIS: Caching returning stale null
[10:58] TEST: Disabled cache → Issue resolved
[11:05] ROOT CAUSE: Cache not invalidated on user update
Entry Types
| Prefix | Use For |
|---|---|
| DISCOVERY | New information gathered |
| HYPOTHESIS | New theory formed |
| TEST | Experiment executed |
| EVIDENCE | Data that supports/refutes hypothesis |
| ROOT CAUSE | Final determination |
| BLOCKED | Cannot proceed, need help |
Benefits
- Prevents revisiting same ground
- Enables handoff to others
- Creates learning artifact
- Catches circular investigation
- Documents what was tried
Root Cause Report
Post-investigation documentation.
Template
# Root Cause Analysis: {Issue Title}
## Summary
Brief description of issue and resolution in 2-3 sentences.
## Timeline
- {DATE/TIME}: Issue first observed
- {DATE/TIME}: Investigation started
- {DATE/TIME}: Root cause identified
- {DATE/TIME}: Fix deployed
- {DATE/TIME}: Issue verified resolved
## Symptoms
What users/systems experienced:
- {Observable symptom 1}
- {Observable symptom 2}
## Root Cause
**Primary cause**: {What ultimately caused the issue}
**Contributing factors**:
- {Factor that made issue worse or harder to catch}
- {Factor that enabled issue to occur}
## Evidence
How we confirmed this was the root cause:
- {Evidence 1}
- {Evidence 2}
- {Test that confirmed}
## Resolution
**Immediate fix**: {What was done to resolve}
**Code/config changes**: {Links to PRs, commits}
## Prevention
**How to prevent recurrence**:
- {Preventive measure 1}
- {Preventive measure 2}
**Detection improvements**:
- {How to catch this earlier next time}
## Lessons Learned
- {What went well in investigation}
- {What could improve}
- {Knowledge gap discovered}
## Appendix
- Investigation log
- Relevant logs/screenshots
- Related incidents
Quick Incident Notes
For minor issues not requiring full RCA.
Template
## Incident: {Brief title}
**Date**: {When}
**Duration**: {How long}
**Impact**: {Who/what affected}
**Cause**: {One sentence}
**Fix**: {One sentence}
**Prevention**: {One sentence}
Hypothesis Tracking
When testing multiple hypotheses.
Template
## Hypotheses
### H1: {Description}
- **Likelihood**: High/Medium/Low
- **Evidence for**: {Supporting data}
- **Evidence against**: {Contradicting data}
- **Test**: {How to verify}
- **Status**: Pending/Testing/Confirmed/Ruled Out
### H2: {Description}
...
Status Flow
Pending → Testing → Confirmed
→ Ruled Out
Environment Snapshot
Document system state at time of issue.
Template
## Environment Snapshot
**System**: {Service/application name}
**Instance**: {Server/container ID}
**Time**: {Timestamp}
### Versions
- Application: {version}
- Runtime: {version}
- Key dependencies: {versions}
### Configuration
- {Relevant config values}
### State
- Memory: {usage}
- CPU: {usage}
- Connections: {counts}
- Queue depth: {if applicable}
### Recent Changes
- {Deploy/config change within last 24-48h}
Postmortem Meeting Notes
For team discussions.
Agenda Template
## Postmortem: {Incident}
**Date**: {Meeting date}
**Attendees**: {Names}
### Summary (5 min)
{Owner presents incident summary}
### Timeline Review (10 min)
{Walk through what happened when}
### Root Cause Discussion (15 min)
- Confirmed root cause
- Contributing factors
- Why wasn't this caught earlier?
### Action Items (15 min)
| Action | Owner | Due Date |
|--------|-------|----------|
| {Preventive measure} | {Name} | {Date} |
### Follow-up
- Next review: {Date if needed}
- Documentation: {Where RCA will be stored}
Checklist: Documentation Quality
Before closing investigation:
- Root cause clearly stated
- Evidence documented
- Alternative hypotheses addressed
- Resolution steps recorded
- Prevention measures identified
- Learning captured
- Stakeholders informed