playbook/antigravity-awesome-skills/skills/monopoly/patterns/SKILL.md

---
name: patterns
description: Reference document for monopoly patterns.
risk: safe
reports-to: monopoly
---

# MONOPOLY — Design Patterns Deep Dive

## Table of Contents
1. CQRS
2. Event Sourcing
3. Saga Pattern
4. Circuit Breaker
5. Bulkhead
6. Strangler Fig
7. Sidecar / Service Mesh
8. Outbox Pattern
9. Consistent Hashing
10. Backpressure
11. Leader Election
12. Two-Phase Commit

---

## 1. CQRS (Command Query Responsibility Segregation)

**What it is:** Separate the read model (Query) from the write model (Command) into distinct services, databases, or code paths.

**When to use:**
- Read load is 10×+ write load (most web apps)
- Read queries are complex aggregations over write data
- Need to optimize read and write paths independently
- Domain model is complex (DDD contexts)

**Implementation:**
```
Write Path:  Client → Command API → Write DB (normalized, PostgreSQL)
Read Path:   Client → Query API  → Read DB (denormalized, Redis / Elasticsearch)
Sync:        Write DB → CDC (Debezium) → Message Queue → Read DB updater
```

**Trade-offs:**
- ✅ Independent scaling of read and write
- ✅ Optimized schemas for each operation type
- ❌ Eventual consistency between write and read models
- ❌ Increased complexity; two models to maintain

**Real-world users:** Amazon (order service), LinkedIn (feed)

---

## 2. Event Sourcing

**What it is:** Store state as a sequence of immutable events rather than current state. Rebuild current state by replaying events.

**When to use:**
- Full audit trail is a regulatory requirement (fintech, healthcare)
- Need to replay history for debugging or analytics
- Complex domain with many state transitions
- Need to derive multiple read projections from same data

**Implementation:**
```
Event Store: append-only log (Kafka, EventStoreDB)
Snapshots:   periodic snapshots to speed up state rebuild
Projections: consumers build read models from events
```

**Trade-offs:**
- ✅ Complete audit history; perfect for compliance
- ✅ Replay and time-travel debugging
- ❌ Querying current state requires projection maintenance
- ❌ Event schema evolution is hard
- ❌ High storage overhead over time

---

## 3. Saga Pattern

**What it is:** Manage distributed transactions across microservices via a sequence of local transactions, each publishing an event. If a step fails, compensating transactions undo previous steps.

**Two variants:**
- **Choreography:** Services react to events autonomously (decentralized)
- **Orchestration:** A central Saga Orchestrator coordinates steps (centralized)

**When to use:**
- Multi-service workflows where ACID across services is impossible
- Long-running business transactions (order → payment → inventory → shipping)
- Need rollback across service boundaries

**Choreography Example:**
```
OrderService creates order →
  [event: OrderCreated] →
    PaymentService charges card →
      [event: PaymentProcessed] →
        InventoryService reserves stock →
          [event: StockReserved] →
            ShippingService books courier
```

**Compensating Transactions (on failure):**
```
ShippingService fails →
  [event: ShippingFailed] →
    InventoryService releases stock →
      PaymentService refunds card →
        OrderService marks order failed
```

**Trade-offs:**
- ✅ No distributed locking; high availability
- ✅ Scales well across services
- ❌ Hard to debug; distributed trace required
- ❌ Compensating transactions are complex to implement correctly

---

## 4. Circuit Breaker

**What it is:** A proxy that monitors calls to a service. If failure rate exceeds threshold, the circuit "opens" and calls fail fast instead of waiting for timeout.

**States:**
```
CLOSED  → calls pass through; monitor failure rate
OPEN    → calls fail immediately; no calls to downstream
HALF-OPEN → let a probe call through; if success, close; if fail, stay open
```

**When to use:**
- Calling any external service (payment gateway, SMS, email)
- Microservices calling each other
- Preventing timeout cascade when downstream is slow

**Implementation tools:** Hystrix (deprecated), Resilience4j, Polly (.NET), Envoy proxy

**Thresholds (starting point):**
- Open after 50% failure rate over 10 requests
- Stay open for 30 seconds
- Half-open: allow 1 probe request

**Trade-offs:**
- ✅ Prevents cascade failures
- ✅ Gives downstream time to recover
- ❌ Adds latency overhead for monitoring
- ❌ Requires fallback behavior when circuit is open

---

## 5. Bulkhead

**What it is:** Isolate components so a failure in one doesn't consume resources of others. Named after the watertight compartments in ship hulls.

**Types:**
- **Thread Pool Bulkhead:** Separate thread pools per service call
- **Semaphore Bulkhead:** Limit concurrent calls per service
- **Process Bulkhead:** Separate processes/containers per service type

**When to use:**
- Multiple tenants sharing infrastructure (SaaS)
- One slow service consuming all connection pool slots
- Protecting critical services from being starved by non-critical ones

**Example:**
```
Without bulkhead:
  [Recommendation Service hangs] → fills shared thread pool → [Payment Service starves]

With bulkhead:
  [Recommendation Service hangs] → fills its own thread pool (10 threads) → [Payment Service unaffected, has its own 50 threads]
```

---

## 6. Strangler Fig Pattern

**What it is:** Incrementally replace a legacy monolith by routing new functionality to new microservices, while keeping the monolith alive for unchanged features.

**Migration steps:**
```
Phase 1: Deploy proxy in front of monolith (no user impact)
Phase 2: Route one feature to new microservice
Phase 3: Verify; deprecate that feature in monolith
Phase 4: Repeat for each feature
Phase 5: Monolith is empty; decommission
```

**When to use:**
- Migrating legacy monolith to microservices
- Can't do a big-bang rewrite (too risky)
- Need to ship new features during migration

**Trade-offs:**
- ✅ Zero downtime migration
- ✅ Incremental risk
- ❌ Dual maintenance burden during migration (monolith + new services)
- ❌ Proxy adds latency; must be managed carefully

---

## 7. Outbox Pattern

**What it is:** Solve the dual-write problem (write to DB AND publish to queue atomically) by writing the event to an "outbox" table in the same DB transaction, then having a separate process relay it to the queue.

**Problem it solves:**
```
❌ WRONG (dual-write race):
  BEGIN;
  UPDATE orders SET status='paid';
  COMMIT;
  // Crash here → event never published, DB and queue are inconsistent
  publish(PaymentProcessed);
```

```
✅ CORRECT (outbox):
  BEGIN;
  UPDATE orders SET status='paid';
  INSERT INTO outbox (event_type, payload) VALUES ('PaymentProcessed', {...});
  COMMIT;
  // Relay process reads outbox and publishes to Kafka
  // At-least-once delivery guaranteed; make consumers idempotent
```

**Relay options:** Debezium (CDC), polling relay, transaction log tailing

---

## 8. Consistent Hashing

**What it is:** A hashing scheme where adding or removing nodes requires only K/N keys to be remapped (K = keys, N = nodes), instead of remapping all keys.

**When to use:**
- Distributing cache keys across Redis cluster nodes
- Routing requests to servers in a distributed system
- Partitioning data across database nodes

**Virtual nodes:** Assign multiple positions per physical node on the hash ring to ensure even distribution even with few nodes.

---

## 9. Backpressure

**What it is:** A mechanism for consumers to signal producers to slow down when they can't keep up, preventing memory exhaustion and cascade failures.

**Strategies:**
- **Drop:** Discard overflow messages (acceptable for metrics, logs)
- **Buffer:** Queue up to a limit, then block or drop
- **Block:** Producer waits until consumer catches up (simplest, may cause timeout)
- **Rate Limit:** Throttle producers at ingestion point

**When to use:**
- Message queue consumers are slower than producers
- Real-time data pipeline ingestion spikes
- API rate limiting for upstream clients

---

## 10. Leader Election

**What it is:** In a distributed system, elect a single node to perform a privileged task (e.g., writing to DB, sending scheduled jobs, coordinating work).

**Algorithms:**
- **Raft:** Used by etcd, CockroachDB, Consul. Practical and well-understood.
- **ZooKeeper (ZAB):** Used by Kafka, HBase. Mature but operationally heavy.
- **Bully Algorithm:** Simple; highest ID wins. Not fault-tolerant.

**When to use:**
- Scheduled jobs that should only run once (cron replacement)
- Primary/replica database failover coordination
- Distributed lock management

**Tools:** etcd, ZooKeeper, Consul, Redis (Redlock — use with caution)

---

## 11. Two-Phase Commit (2PC)

**What it is:** A distributed algorithm that ensures all participants in a transaction either all commit or all abort.

**Phases:**
```
Phase 1 (Prepare): Coordinator asks all participants "can you commit?"
  All say YES → proceed to Phase 2
  Any says NO → abort

Phase 2 (Commit): Coordinator tells all participants to commit
```

**When to use (sparingly):**
- Strong consistency is an absolute requirement across services
- Data loss is catastrophic (financial settlements)

**Why to avoid:**
- Coordinator is a SPOF
- Blocks on participant failure
- Very low throughput under contention
- Prefer Saga Pattern in most microservice architectures

---

## 12. Read-Through / Write-Through / Write-Behind Cache

**Read-Through:**
```
Client → Cache (miss) → Cache fetches from DB → Returns to client
```
Cache is always populated on miss. Simple for clients. Risk: cold start.

**Write-Through:**
```
Client → Cache → Cache writes to DB synchronously → Confirms
```
Strong consistency. Higher write latency. Good for read-heavy with consistency need.

**Write-Behind (Write-Back):**
```
Client → Cache → Confirms immediately → Async flush to DB
```
Very low write latency. Risk of data loss if cache fails before flush. Good for high-throughput counters, analytics.

**Cache-Aside (Lazy Loading):**
```
Client → Cache (miss) → Client fetches from DB → Client writes to Cache
```
Most common. Application owns cache logic. Risk: thundering herd on cold start.


## Limitations
- This is a reference document and may not cover all edge cases. Always verify architectures before production.