332 lines
10 KiB
Markdown
332 lines
10 KiB
Markdown
---
|
||
name: patterns
|
||
description: Reference document for monopoly patterns.
|
||
risk: safe
|
||
reports-to: monopoly
|
||
---
|
||
|
||
# MONOPOLY — Design Patterns Deep Dive
|
||
|
||
## Table of Contents
|
||
1. CQRS
|
||
2. Event Sourcing
|
||
3. Saga Pattern
|
||
4. Circuit Breaker
|
||
5. Bulkhead
|
||
6. Strangler Fig
|
||
7. Sidecar / Service Mesh
|
||
8. Outbox Pattern
|
||
9. Consistent Hashing
|
||
10. Backpressure
|
||
11. Leader Election
|
||
12. Two-Phase Commit
|
||
|
||
---
|
||
|
||
## 1. CQRS (Command Query Responsibility Segregation)
|
||
|
||
**What it is:** Separate the read model (Query) from the write model (Command) into distinct services, databases, or code paths.
|
||
|
||
**When to use:**
|
||
- Read load is 10×+ write load (most web apps)
|
||
- Read queries are complex aggregations over write data
|
||
- Need to optimize read and write paths independently
|
||
- Domain model is complex (DDD contexts)
|
||
|
||
**Implementation:**
|
||
```
|
||
Write Path: Client → Command API → Write DB (normalized, PostgreSQL)
|
||
Read Path: Client → Query API → Read DB (denormalized, Redis / Elasticsearch)
|
||
Sync: Write DB → CDC (Debezium) → Message Queue → Read DB updater
|
||
```
|
||
|
||
**Trade-offs:**
|
||
- ✅ Independent scaling of read and write
|
||
- ✅ Optimized schemas for each operation type
|
||
- ❌ Eventual consistency between write and read models
|
||
- ❌ Increased complexity; two models to maintain
|
||
|
||
**Real-world users:** Amazon (order service), LinkedIn (feed)
|
||
|
||
---
|
||
|
||
## 2. Event Sourcing
|
||
|
||
**What it is:** Store state as a sequence of immutable events rather than current state. Rebuild current state by replaying events.
|
||
|
||
**When to use:**
|
||
- Full audit trail is a regulatory requirement (fintech, healthcare)
|
||
- Need to replay history for debugging or analytics
|
||
- Complex domain with many state transitions
|
||
- Need to derive multiple read projections from same data
|
||
|
||
**Implementation:**
|
||
```
|
||
Event Store: append-only log (Kafka, EventStoreDB)
|
||
Snapshots: periodic snapshots to speed up state rebuild
|
||
Projections: consumers build read models from events
|
||
```
|
||
|
||
**Trade-offs:**
|
||
- ✅ Complete audit history; perfect for compliance
|
||
- ✅ Replay and time-travel debugging
|
||
- ❌ Querying current state requires projection maintenance
|
||
- ❌ Event schema evolution is hard
|
||
- ❌ High storage overhead over time
|
||
|
||
---
|
||
|
||
## 3. Saga Pattern
|
||
|
||
**What it is:** Manage distributed transactions across microservices via a sequence of local transactions, each publishing an event. If a step fails, compensating transactions undo previous steps.
|
||
|
||
**Two variants:**
|
||
- **Choreography:** Services react to events autonomously (decentralized)
|
||
- **Orchestration:** A central Saga Orchestrator coordinates steps (centralized)
|
||
|
||
**When to use:**
|
||
- Multi-service workflows where ACID across services is impossible
|
||
- Long-running business transactions (order → payment → inventory → shipping)
|
||
- Need rollback across service boundaries
|
||
|
||
**Choreography Example:**
|
||
```
|
||
OrderService creates order →
|
||
[event: OrderCreated] →
|
||
PaymentService charges card →
|
||
[event: PaymentProcessed] →
|
||
InventoryService reserves stock →
|
||
[event: StockReserved] →
|
||
ShippingService books courier
|
||
```
|
||
|
||
**Compensating Transactions (on failure):**
|
||
```
|
||
ShippingService fails →
|
||
[event: ShippingFailed] →
|
||
InventoryService releases stock →
|
||
PaymentService refunds card →
|
||
OrderService marks order failed
|
||
```
|
||
|
||
**Trade-offs:**
|
||
- ✅ No distributed locking; high availability
|
||
- ✅ Scales well across services
|
||
- ❌ Hard to debug; distributed trace required
|
||
- ❌ Compensating transactions are complex to implement correctly
|
||
|
||
---
|
||
|
||
## 4. Circuit Breaker
|
||
|
||
**What it is:** A proxy that monitors calls to a service. If failure rate exceeds threshold, the circuit "opens" and calls fail fast instead of waiting for timeout.
|
||
|
||
**States:**
|
||
```
|
||
CLOSED → calls pass through; monitor failure rate
|
||
OPEN → calls fail immediately; no calls to downstream
|
||
HALF-OPEN → let a probe call through; if success, close; if fail, stay open
|
||
```
|
||
|
||
**When to use:**
|
||
- Calling any external service (payment gateway, SMS, email)
|
||
- Microservices calling each other
|
||
- Preventing timeout cascade when downstream is slow
|
||
|
||
**Implementation tools:** Hystrix (deprecated), Resilience4j, Polly (.NET), Envoy proxy
|
||
|
||
**Thresholds (starting point):**
|
||
- Open after 50% failure rate over 10 requests
|
||
- Stay open for 30 seconds
|
||
- Half-open: allow 1 probe request
|
||
|
||
**Trade-offs:**
|
||
- ✅ Prevents cascade failures
|
||
- ✅ Gives downstream time to recover
|
||
- ❌ Adds latency overhead for monitoring
|
||
- ❌ Requires fallback behavior when circuit is open
|
||
|
||
---
|
||
|
||
## 5. Bulkhead
|
||
|
||
**What it is:** Isolate components so a failure in one doesn't consume resources of others. Named after the watertight compartments in ship hulls.
|
||
|
||
**Types:**
|
||
- **Thread Pool Bulkhead:** Separate thread pools per service call
|
||
- **Semaphore Bulkhead:** Limit concurrent calls per service
|
||
- **Process Bulkhead:** Separate processes/containers per service type
|
||
|
||
**When to use:**
|
||
- Multiple tenants sharing infrastructure (SaaS)
|
||
- One slow service consuming all connection pool slots
|
||
- Protecting critical services from being starved by non-critical ones
|
||
|
||
**Example:**
|
||
```
|
||
Without bulkhead:
|
||
[Recommendation Service hangs] → fills shared thread pool → [Payment Service starves]
|
||
|
||
With bulkhead:
|
||
[Recommendation Service hangs] → fills its own thread pool (10 threads) → [Payment Service unaffected, has its own 50 threads]
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Strangler Fig Pattern
|
||
|
||
**What it is:** Incrementally replace a legacy monolith by routing new functionality to new microservices, while keeping the monolith alive for unchanged features.
|
||
|
||
**Migration steps:**
|
||
```
|
||
Phase 1: Deploy proxy in front of monolith (no user impact)
|
||
Phase 2: Route one feature to new microservice
|
||
Phase 3: Verify; deprecate that feature in monolith
|
||
Phase 4: Repeat for each feature
|
||
Phase 5: Monolith is empty; decommission
|
||
```
|
||
|
||
**When to use:**
|
||
- Migrating legacy monolith to microservices
|
||
- Can't do a big-bang rewrite (too risky)
|
||
- Need to ship new features during migration
|
||
|
||
**Trade-offs:**
|
||
- ✅ Zero downtime migration
|
||
- ✅ Incremental risk
|
||
- ❌ Dual maintenance burden during migration (monolith + new services)
|
||
- ❌ Proxy adds latency; must be managed carefully
|
||
|
||
---
|
||
|
||
## 7. Outbox Pattern
|
||
|
||
**What it is:** Solve the dual-write problem (write to DB AND publish to queue atomically) by writing the event to an "outbox" table in the same DB transaction, then having a separate process relay it to the queue.
|
||
|
||
**Problem it solves:**
|
||
```
|
||
❌ WRONG (dual-write race):
|
||
BEGIN;
|
||
UPDATE orders SET status='paid';
|
||
COMMIT;
|
||
// Crash here → event never published, DB and queue are inconsistent
|
||
publish(PaymentProcessed);
|
||
```
|
||
|
||
```
|
||
✅ CORRECT (outbox):
|
||
BEGIN;
|
||
UPDATE orders SET status='paid';
|
||
INSERT INTO outbox (event_type, payload) VALUES ('PaymentProcessed', {...});
|
||
COMMIT;
|
||
// Relay process reads outbox and publishes to Kafka
|
||
// At-least-once delivery guaranteed; make consumers idempotent
|
||
```
|
||
|
||
**Relay options:** Debezium (CDC), polling relay, transaction log tailing
|
||
|
||
---
|
||
|
||
## 8. Consistent Hashing
|
||
|
||
**What it is:** A hashing scheme where adding or removing nodes requires only K/N keys to be remapped (K = keys, N = nodes), instead of remapping all keys.
|
||
|
||
**When to use:**
|
||
- Distributing cache keys across Redis cluster nodes
|
||
- Routing requests to servers in a distributed system
|
||
- Partitioning data across database nodes
|
||
|
||
**Virtual nodes:** Assign multiple positions per physical node on the hash ring to ensure even distribution even with few nodes.
|
||
|
||
---
|
||
|
||
## 9. Backpressure
|
||
|
||
**What it is:** A mechanism for consumers to signal producers to slow down when they can't keep up, preventing memory exhaustion and cascade failures.
|
||
|
||
**Strategies:**
|
||
- **Drop:** Discard overflow messages (acceptable for metrics, logs)
|
||
- **Buffer:** Queue up to a limit, then block or drop
|
||
- **Block:** Producer waits until consumer catches up (simplest, may cause timeout)
|
||
- **Rate Limit:** Throttle producers at ingestion point
|
||
|
||
**When to use:**
|
||
- Message queue consumers are slower than producers
|
||
- Real-time data pipeline ingestion spikes
|
||
- API rate limiting for upstream clients
|
||
|
||
---
|
||
|
||
## 10. Leader Election
|
||
|
||
**What it is:** In a distributed system, elect a single node to perform a privileged task (e.g., writing to DB, sending scheduled jobs, coordinating work).
|
||
|
||
**Algorithms:**
|
||
- **Raft:** Used by etcd, CockroachDB, Consul. Practical and well-understood.
|
||
- **ZooKeeper (ZAB):** Used by Kafka, HBase. Mature but operationally heavy.
|
||
- **Bully Algorithm:** Simple; highest ID wins. Not fault-tolerant.
|
||
|
||
**When to use:**
|
||
- Scheduled jobs that should only run once (cron replacement)
|
||
- Primary/replica database failover coordination
|
||
- Distributed lock management
|
||
|
||
**Tools:** etcd, ZooKeeper, Consul, Redis (Redlock — use with caution)
|
||
|
||
---
|
||
|
||
## 11. Two-Phase Commit (2PC)
|
||
|
||
**What it is:** A distributed algorithm that ensures all participants in a transaction either all commit or all abort.
|
||
|
||
**Phases:**
|
||
```
|
||
Phase 1 (Prepare): Coordinator asks all participants "can you commit?"
|
||
All say YES → proceed to Phase 2
|
||
Any says NO → abort
|
||
|
||
Phase 2 (Commit): Coordinator tells all participants to commit
|
||
```
|
||
|
||
**When to use (sparingly):**
|
||
- Strong consistency is an absolute requirement across services
|
||
- Data loss is catastrophic (financial settlements)
|
||
|
||
**Why to avoid:**
|
||
- Coordinator is a SPOF
|
||
- Blocks on participant failure
|
||
- Very low throughput under contention
|
||
- Prefer Saga Pattern in most microservice architectures
|
||
|
||
---
|
||
|
||
## 12. Read-Through / Write-Through / Write-Behind Cache
|
||
|
||
**Read-Through:**
|
||
```
|
||
Client → Cache (miss) → Cache fetches from DB → Returns to client
|
||
```
|
||
Cache is always populated on miss. Simple for clients. Risk: cold start.
|
||
|
||
**Write-Through:**
|
||
```
|
||
Client → Cache → Cache writes to DB synchronously → Confirms
|
||
```
|
||
Strong consistency. Higher write latency. Good for read-heavy with consistency need.
|
||
|
||
**Write-Behind (Write-Back):**
|
||
```
|
||
Client → Cache → Confirms immediately → Async flush to DB
|
||
```
|
||
Very low write latency. Risk of data loss if cache fails before flush. Good for high-throughput counters, analytics.
|
||
|
||
**Cache-Aside (Lazy Loading):**
|
||
```
|
||
Client → Cache (miss) → Client fetches from DB → Client writes to Cache
|
||
```
|
||
Most common. Application owns cache logic. Risk: thundering herd on cold start.
|
||
|
||
|
||
## Limitations
|
||
- This is a reference document and may not cover all edge cases. Always verify architectures before production.
|