# Scalability

## Performance Modeling

Key metrics:
- **Latency**: p50, p95, p99 response times
- **Throughput**: requests per second
- **Resource utilization**: CPU, memory, network, disk I/O
- **Error rates**: 4xx, 5xx responses
- **Saturation**: queue depths, connection pools

Capacity planning:
1. **Baseline**: measure current performance under normal load
2. **Load test**: use realistic traffic patterns (gradual ramp, spike, sustained)
3. **Find limits**: identify bottlenecks (CPU? DB? Network?)
4. **Model growth**: project based on business metrics (users, transactions)
5. **Plan headroom**: maintain 30-50% capacity buffer

## Bottleneck Identification

| Resource | Symptoms | Solutions |
|----------|----------|-----------|
| Database | High query latency, connection pool exhaustion | Indexing, query optimization, read replicas, caching, sharding |
| CPU | High utilization, slow processing | Horizontal scaling, algorithm optimization, caching, async processing |
| Memory | OOM errors, high GC pressure | Memory profiling, data structure optimization, streaming processing |
| Network | High bandwidth, slow transfers | Compression, CDN, protocol optimization (HTTP/2, gRPC) |
| I/O | Disk queue depth, slow reads/writes | SSD, batching, async I/O, caching |

## Scaling Strategies

### Vertical scaling (bigger machines)

- **Pros**: Simple, no code changes
- **Cons**: Expensive, hard limits, single point of failure
- **Use when**: Quick fix needed, not yet optimized

### Horizontal scaling (more machines)

- **Pros**: Cost-effective, no hard limits, fault tolerant
- **Cons**: Requires stateless design, load balancing complexity
- **Requirements**: Stateless services, shared state in DB/cache

### Caching layers

| Layer | Location | Tradeoffs |
|-------|----------|-----------|
| L1 | Application | Fastest, stale risk |
| L2 | Distributed (Redis, Memcached) | Shared across instances |
| L3 | CDN (CloudFlare, CloudFront) | Edge caching |

Strategies: cache-aside, write-through, write-behind based on needs

### Database scaling

- **Read replicas**: Route reads to replicas, writes to primary
- **Sharding**: Partition data (customer, geography, hash)
- **Connection pooling**: PgBouncer, connection reuse
- **Query optimization**: Indexes, query tuning, explain plans