playbook/outfitter-agents/plugins/outfitter/skills/find-root-causes/references/elimination-techniques.md

5.4 KiB

Elimination Techniques

Systematic methods for narrowing problem scope.

Halving the problem space with each test.

When to Use

  • Large problem space
  • Changes have clear ordering (time, code versions, config options)
  • Tests are quick relative to problem size

Process

1. Identify range: known-good state → known-bad state
2. Test midpoint: does issue exist here?
3. Narrow range: move to half containing issue
4. Repeat: until single change identified

Example: Git Bisect

# Automated binary search through commits
git bisect start
git bisect bad HEAD           # Current commit is bad
git bisect good v1.2.0        # Known good version
git bisect run ./test.sh      # Automatically find breaking commit

Example: Configuration

50 config options, one causes issue

Round 1: Test with first 25 options only
  → Issue present → problem in first 25
Round 2: Test with first 12 options only
  → Issue absent → problem in options 13-25
Round 3: Test with options 13-18
  → Issue present → problem in 13-18
...continue until single option found

Efficiency

Problem Size Binary Search Steps Linear Search Steps
10 items ~4 10
100 items ~7 100
1000 items ~10 1000

Variable Isolation

Changing one thing at a time.

When to Use

  • Multiple variables could be cause
  • Interactions between variables possible
  • Need to establish clear causation

Process

1. Baseline: measure with all defaults
2. Change X only: measure impact
3. Revert X, change Y only: measure impact
4. Repeat for each variable
5. If interactions suspected: test combinations

Example: Performance Degradation

Suspects: new library version, config change, increased data volume

Test 1: Revert library only → no change → not library
Test 2: Revert config only → improvement → config contributes
Test 3: Reduce data volume → improvement → data also contributes
Test 4: Both config + data → full improvement → both factors

Root cause: Config change + data growth interaction

Common Mistakes

  • Changing multiple variables at once
  • Not reverting between tests
  • Assuming first positive result is complete answer
  • Not testing combinations when interactions possible

Process of Elimination

Systematically ruling out possibilities.

When to Use

  • Finite set of possible causes
  • Can definitively rule things out
  • Structured environment

Process

Start with: All possible causes
For each possibility:
  - Design test to rule out
  - Execute test
  - If ruled out: remove from list
  - If not ruled out: keep on list
Continue until: single possibility remains

Documentation Format

Possible causes:
✗ Component A — ruled out: reproduced without A present
✗ Component B — ruled out: tested in isolation, worked
✗ External factor — ruled out: reproduced in clean environment
○ Component C — not yet tested
✓ Component D — confirmed: removing D fixes issue

Example: Integration Failure

System: API → Queue → Worker → Database

Test 1: Call API directly, bypass queue
  → Issue persists → not queue-related

Test 2: Worker processes test message
  → Success → worker + database OK

Test 3: Examine API-to-queue handoff
  → Found: message format incorrect

Root cause: API serialization bug

Divide and Conquer

Breaking complex system into testable segments.

When to Use

  • Complex multi-component systems
  • Don't know which area to focus on
  • Want to parallelize investigation

Process

1. Map system components
2. Identify boundaries between components
3. Test at each boundary: is data correct here?
4. Find boundary where data becomes incorrect
5. Focus investigation on that component

Example: Data Pipeline

Source → Ingestion → Transform → Validation → Storage → API

Check at each stage:
- After Ingestion: data correct ✓
- After Transform: data correct ✓
- After Validation: data INCORRECT ✗

Root cause is in Validation stage.

Environment Bisection

Isolating environment-specific factors.

When to Use

  • "Works on my machine" situations
  • Environment-dependent bugs
  • Deployment issues

Process

1. List environment differences (OS, versions, config, resources)
2. Create minimal diff between working and failing
3. Test with progressive alignment
4. Identify minimum difference causing failure

Difference Checklist

Category Working Failing
OS/Version
Runtime version
Dependencies
Config files
Environment variables
Network/ports
Permissions
Resource limits

Technique Selection Guide

Situation Recommended Technique
Many commits to check Binary search (git bisect)
Multiple config options Variable isolation
Finite component list Process of elimination
Multi-stage pipeline Divide and conquer
"Works elsewhere" Environment bisection
Unknown scope Start with divide and conquer, then specialize

Combining Techniques

Often multiple techniques used together:

1. Divide and conquer: narrow to subsystem
2. Process of elimination: rule out components in subsystem
3. Variable isolation: identify specific configuration
4. Binary search: find when it broke

Each technique narrows scope; combine for efficiency.