# Durability Patterns for Agent Harness

> "The 100th tool call is where agents fail. Non-trivial agent tasks involve dozens to hundreds of tool calls. Context windows fill up, summarization kicks in, and the agent quietly forgets something essential."

This reference documents patterns for building durable agent infrastructure — harnesses that survive failures, manage context across long sessions, and maintain agent effectiveness over extended workstreams.

Advanced profile only. Load this reference only when the user explicitly requests resumable
execution, checkpoints, durable state, long-running agent orchestration, or long-term memory. Do
not create `harness/state`, `harness/checkpoints`, or `harness/memory` as part of the default core
harness.

## 1. The Durability Problem

### Why Agents Drift

| Phase | Tool Calls | Context State | Risk |
|-------|-----------|---------------|------|
| Start (0-20) | Fresh | Full context available | Low |
| Middle (20-50) | Active | Context filling up | Medium |
| Extended (50-100) | Sustained | Summarization active | High |
| Long-running (100+) | Many | Critical details lost | Very High |

### Symptoms of Context Drift

- Agent forgets initial requirements
- Repeated attempts at already-completed tasks
- Contradictory decisions compared to earlier choices
- "Hallucinating" APIs or file structures that don't exist
- Losing track of what's changed vs. what's remaining

## 2. Context Engineering Strategies

### Strategy 1: Progressive Summarization

**When to use:** Context window approaching 50% capacity

**Implementation:**
```json
// harness/state/context-summary.json
{
  "task_id": "implement-auth-flow",
  "phase": "execution",
  "completed_steps": [
    {"step": 1, "summary": "Created UserService interface in types/user.go:15-30"},
    {"step": 2, "summary": "Implemented DefaultUserService in core/user.go:20-80"},
    {"step": 3, "summary": "Added JWT middleware in api/middleware/auth.go"}
  ],
  "current_step": 4,
  "remaining_steps": ["Add tests", "Update ARCHITECTURE.md"],
  "critical_context": {
    "architecture_layer": "L2 (Core)",
    "key_interfaces": ["UserService", "AuthProvider"],
    "validation_command": "make lint-arch && go test ./..."
  },
  "timestamp": "2026-03-23T10:30:00Z"
}
```

**Agent instruction in exec-plan:**
```markdown
## Context Management

After completing each major step:
1. Update `harness/state/context-summary.json` with step summary
2. Prune verbose details from working memory
3. Keep only: current file being edited, validation commands, critical constraints

If context feels unclear, re-read:
- AGENTS.md (navigation)
- harness/state/context-summary.json (progress)
- Current step details in this exec-plan
```

### Strategy 2: State Offloading

**When to use:** Multi-phase tasks with intermediate artifacts

**File structure:**
```
harness/
├── state/
│   ├── current-task.json      # What the agent is doing now
│   ├── context-summary.json   # Summarized progress
│   └── decisions.json         # Key decisions made (for consistency)
├── checkpoints/
│   ├── phase-1-complete.json  # Snapshot after phase 1
│   ├── phase-2-complete.json  # Snapshot after phase 2
│   └── latest.json            # Symlink to most recent checkpoint
└── artifacts/
    └── generated/             # Intermediate files the agent produced
```

**Decision log format:**
```json
// harness/state/decisions.json
{
  "decisions": [
    {
      "id": "D001",
      "timestamp": "2026-03-23T09:00:00Z",
      "context": "Choosing authentication method",
      "decision": "JWT with refresh tokens",
      "rationale": "Stateless, works with microservices architecture",
      "alternatives_rejected": ["Sessions (stateful)", "OAuth only (too complex for MVP)"]
    },
    {
      "id": "D002",
      "timestamp": "2026-03-23T09:30:00Z",
      "context": "Where to put auth middleware",
      "decision": "api/middleware/auth.go",
      "rationale": "Follows existing middleware pattern in api/middleware/",
      "alternatives_rejected": ["internal/auth/ (wrong layer)"]
    }
  ]
}
```

### Strategy 3: Task Isolation via Sub-Agents

**When to use:** Independent subtasks that don't need full context

**Pattern:**
```markdown
## Phase 3: Parallel Work

### 3.1 Test Writing (spawn sub-agent)
**Context needed:** UserService interface, existing test patterns
**Context NOT needed:** Implementation details of phases 1-2
**Output:** test files in internal/core/*_test.go

### 3.2 Documentation Update (spawn sub-agent)
**Context needed:** Public API changes, ARCHITECTURE.md
**Context NOT needed:** Internal implementation decisions
**Output:** Updated docs/design-docs/auth.md
```

**Sub-agent context injection:**
```json
{
  "task": "write-tests-for-userservice",
  "minimal_context": {
    "interface_file": "internal/types/user.go",
    "test_patterns_file": "internal/core/example_test.go",
    "output_directory": "internal/core/",
    "validation": "go test ./internal/core/..."
  },
  "excluded_context": ["implementation details", "other phases", "decision history"]
}
```

### Strategy 4: Checkpoint Resumption

**When to use:** Tasks that may be interrupted (restarts, errors, timeouts)

**Checkpoint schema:**
```json
// harness/checkpoints/phase-2-complete.json
{
  "checkpoint_id": "cp-20260323-103000",
  "task_id": "implement-auth-flow",
  "phase_completed": 2,
  "timestamp": "2026-03-23T10:30:00Z",

  "state_snapshot": {
    "files_created": ["internal/types/user.go", "internal/core/user.go"],
    "files_modified": ["go.mod"],
    "validation_status": {"build": "pass", "lint": "pass", "test": "pass"}
  },

  "resume_instructions": {
    "next_phase": 3,
    "context_to_reload": ["AGENTS.md", "docs/exec-plans/active/auth-flow.md"],
    "state_to_restore": "harness/state/context-summary.json",
    "first_step": "Run validation to confirm state is intact"
  },

  "critical_invariants": [
    "UserService interface must have GetUser, CreateUser, UpdateUser methods",
    "All new code must be in L2 (Core) layer"
  ]
}
```

**Resume flow:**
```markdown
## Resumption Protocol

If resuming from checkpoint:

1. Read checkpoint file: `harness/checkpoints/latest.json`
2. Verify state is intact:
   ```bash
   make lint-arch && go build ./...
   ```
3. If verification fails:
   - Check which files are corrupted
   - Restore from checkpoint's expected state
   - Re-run from last known good state
4. If verification passes:
   - Load context from checkpoint's `context_to_reload`
   - Continue from `next_phase`
```

## 3. Memory Architecture

> "A vector database alone is not memory. That's like giving someone a filing cabinet and calling it a brain."

### Three Types of Agent Memory

| Type | What It Stores | How to Implement | When to Access |
|------|---------------|------------------|----------------|
| **Episodic** | What happened | `harness/memory/episodes/` — timestamped event logs | When debugging why something went wrong |
| **Semantic** | What I know | `harness/memory/knowledge/` — facts about this codebase | Before making architectural decisions |
| **Procedural** | How I do things | `harness/memory/procedures/` — learned workflows | When starting familiar task types |

### Episodic Memory Implementation

```json
// harness/memory/episodes/2026-03-23-auth-implementation.json
{
  "episode_id": "ep-20260323-auth",
  "task": "Implement JWT authentication",
  "outcome": "success",
  "duration_minutes": 45,
  "tool_calls": 87,

  "key_events": [
    {
      "timestamp": "2026-03-23T09:15:00Z",
      "event": "lint_failure",
      "details": "Attempted to import core/config from types/user.go",
      "resolution": "Moved config-dependent logic to core/user.go",
      "lesson": "Layer 0 (types/) cannot import Layer 2 (core/)"
    },
    {
      "timestamp": "2026-03-23T09:45:00Z",
      "event": "test_failure",
      "details": "TestGetUser failed: expected mock to be called",
      "resolution": "Added missing dependency injection in test setup",
      "lesson": "Always use constructor injection for testability"
    }
  ],

  "patterns_discovered": [
    "This codebase uses constructor injection, not field injection",
    "Test files should mirror source file structure"
  ]
}
```

### Semantic Memory Implementation

```json
// harness/memory/knowledge/codebase-facts.json
{
  "updated": "2026-03-23T10:00:00Z",

  "architecture": {
    "layers": ["L0: types/", "L1: utils/", "L2: core/", "L3: api/, cmd/"],
    "forbidden_imports": "Lower layers cannot import higher layers",
    "key_interfaces": ["UserService", "AuthProvider", "Logger"]
  },

  "conventions": {
    "naming": "camelCase for variables, PascalCase for exported",
    "testing": "Table-driven tests, testify for assertions",
    "error_handling": "Wrap errors with fmt.Errorf, use typed errors from errors/"
  },

  "gotchas": [
    "chi router middleware order matters — auth before logging",
    "Database migrations must be idempotent",
    "Config is loaded once at startup, not reloaded"
  ]
}
```

### Procedural Memory Implementation

```json
// harness/memory/procedures/add-new-endpoint.json
{
  "procedure": "Add a new API endpoint",
  "last_used": "2026-03-22",
  "success_rate": "4/4",

  "steps": [
    {"step": 1, "action": "Define types in internal/types/", "notes": "Keep Layer 0 pure"},
    {"step": 2, "action": "Implement business logic in internal/core/", "notes": "Use interfaces for deps"},
    {"step": 3, "action": "Add handler in api/handlers/", "notes": "Follow existing handler patterns"},
    {"step": 4, "action": "Register route in api/routes.go", "notes": "Check middleware ordering"},
    {"step": 5, "action": "Write tests for each layer", "notes": "Use table-driven tests"},
    {"step": 6, "action": "Update docs/references/api.md", "notes": "Include request/response examples"}
  ],

  "validation": "make lint-arch && go test ./... && make build"
}
```

## 4. Goal Management

Real agents don't just respond to prompts — they pursue outcomes. A harness should support goal tracking, progress monitoring, and intelligent escalation.

### Goal Tracking Structure

```json
// harness/state/goals.json
{
  "primary_goal": {
    "id": "G001",
    "description": "Implement user authentication with JWT",
    "success_criteria": [
      "Users can register with email/password",
      "Users can log in and receive JWT",
      "Protected endpoints reject invalid tokens",
      "All tests pass"
    ],
    "status": "in_progress",
    "progress": "75%"
  },

  "subgoals": [
    {"id": "G001.1", "description": "Create user types", "status": "completed"},
    {"id": "G001.2", "description": "Implement user service", "status": "completed"},
    {"id": "G001.3", "description": "Add JWT middleware", "status": "completed"},
    {"id": "G001.4", "description": "Write tests", "status": "in_progress"},
    {"id": "G001.5", "description": "Update documentation", "status": "pending"}
  ],

  "blockers": [],

  "escalation_triggers": [
    {"condition": "3+ failures on same step", "action": "Report blocker to user"},
    {"condition": "Goal unchanged for 30 minutes", "action": "Check if stuck"},
    {"condition": "Dependencies missing", "action": "Request clarification"}
  ]
}
```

### When to Escalate

| Signal | Interpretation | Action |
|--------|---------------|--------|
| 3+ lint failures same rule | Possible misunderstanding | Re-read architecture docs, if still failing → escalate |
| Test passing then failing | Regression introduced | Review recent changes, if unclear → escalate |
| Unclear requirements | Ambiguity in task | Document uncertainty, ask user for clarification |
| Conflicting constraints | Impossible task | Report constraint conflict, propose alternatives |
| Progress stalled | Stuck or context lost | Resume from checkpoint, if still stuck → escalate |

### Good Agent Behavior: Restraint

> "Good agent behavior is often about restraint, not verbosity."

**Example from Hugo's article:** An agent detected stress signals (urgency, fragmented input) and instead of a detailed response, sent a simple emoji check-in.

**Harness implication:** Include signals for when to simplify:
- User seems rushed → shorter responses, focus on essentials
- Many rapid corrections → agent may be overcomplicating
- Explicit "just do it" → minimize explanation, maximize action

## 5. Harness Directory Structure for Durability

```
harness/
├── eval/              # Eval framework (existing)
├── trace/             # Observability (existing)
├── state/             # Current task state
│   ├── current-task.json
│   ├── context-summary.json
│   ├── decisions.json
│   └── goals.json
├── checkpoints/       # Resumption points
│   ├── latest.json -> phase-2-complete.json
│   └── phase-*.json
├── memory/            # Agent learning
│   ├── episodes/      # What happened (episodic)
│   ├── knowledge/     # What I know (semantic)
│   └── procedures/    # How I do things (procedural)
└── metrics/           # Performance tracking
    └── task-completion-times.json
```

## 6. Integration with Exec-Plans

Every exec-plan should include durability sections:

```markdown
## Durability Configuration

### Context Management
- **Summary interval:** After each phase
- **Offload location:** harness/state/context-summary.json
- **Critical context:** Layer constraints, validation commands

### Checkpoints
- **Save after:** Each phase completion
- **Location:** harness/checkpoints/
- **Resume from:** Latest checkpoint if interrupted

### Memory Updates
- **Episodic:** Log significant events (failures, key decisions)
- **Semantic:** Update if new codebase facts discovered
- **Procedural:** Update if workflow variation proven successful

### Escalation Triggers
- 3+ failures on same step → Report blocker
- Unclear requirement → Ask user
- Conflicting constraints → Propose alternatives
```

---

## Summary: The Durable Agent Checklist

When designing a harness, verify:

- [ ] **Context engineering:** Does the harness manage context for long sessions?
- [ ] **State persistence:** Can tasks survive restarts and failures?
- [ ] **Checkpoint resumption:** Can agents resume from intermediate states?
- [ ] **Memory architecture:** Does the harness help agents learn across sessions?
- [ ] **Goal tracking:** Does the harness support goal pursuit, not just prompt response?
- [ ] **Escalation paths:** Does the harness know when to ask for help?
- [ ] **Trajectory capture:** Does every run produce usable training data?