2.2 KiB
2.2 KiB
Scalability
Performance Modeling
Key metrics:
- Latency: p50, p95, p99 response times
- Throughput: requests per second
- Resource utilization: CPU, memory, network, disk I/O
- Error rates: 4xx, 5xx responses
- Saturation: queue depths, connection pools
Capacity planning:
- Baseline: measure current performance under normal load
- Load test: use realistic traffic patterns (gradual ramp, spike, sustained)
- Find limits: identify bottlenecks (CPU? DB? Network?)
- Model growth: project based on business metrics (users, transactions)
- Plan headroom: maintain 30-50% capacity buffer
Bottleneck Identification
| Resource | Symptoms | Solutions |
|---|---|---|
| Database | High query latency, connection pool exhaustion | Indexing, query optimization, read replicas, caching, sharding |
| CPU | High utilization, slow processing | Horizontal scaling, algorithm optimization, caching, async processing |
| Memory | OOM errors, high GC pressure | Memory profiling, data structure optimization, streaming processing |
| Network | High bandwidth, slow transfers | Compression, CDN, protocol optimization (HTTP/2, gRPC) |
| I/O | Disk queue depth, slow reads/writes | SSD, batching, async I/O, caching |
Scaling Strategies
Vertical scaling (bigger machines)
- Pros: Simple, no code changes
- Cons: Expensive, hard limits, single point of failure
- Use when: Quick fix needed, not yet optimized
Horizontal scaling (more machines)
- Pros: Cost-effective, no hard limits, fault tolerant
- Cons: Requires stateless design, load balancing complexity
- Requirements: Stateless services, shared state in DB/cache
Caching layers
| Layer | Location | Tradeoffs |
|---|---|---|
| L1 | Application | Fastest, stale risk |
| L2 | Distributed (Redis, Memcached) | Shared across instances |
| L3 | CDN (CloudFlare, CloudFront) | Edge caching |
Strategies: cache-aside, write-through, write-behind based on needs
Database scaling
- Read replicas: Route reads to replicas, writes to primary
- Sharding: Partition data (customer, geography, hash)
- Connection pooling: PgBouncer, connection reuse
- Query optimization: Indexes, query tuning, explain plans