269 lines
12 KiB
Markdown
269 lines
12 KiB
Markdown
---
|
||
name: tech-matrix
|
||
description: Reference document for monopoly tech-matrix.
|
||
risk: safe
|
||
reports-to: monopoly
|
||
---
|
||
|
||
# MONOPOLY — Technology Decision Matrix
|
||
|
||
## Table of Contents
|
||
1. Database Selection
|
||
2. Cache Selection
|
||
3. Message Queue / Event Streaming
|
||
4. API Protocol
|
||
5. Search Engine
|
||
6. Object Storage
|
||
7. Container Orchestration
|
||
8. Load Balancer
|
||
9. Observability Stack
|
||
10. CDN
|
||
|
||
---
|
||
|
||
## 1. Database Selection
|
||
|
||
### Relational (SQL)
|
||
|
||
| Database | Best For | Avoid When | Scale Ceiling |
|
||
|----------|----------|------------|---------------|
|
||
| **PostgreSQL** | Complex queries, JSONB, GIS, strong consistency, most default use cases | Ultra-high write throughput (>100K writes/s) | ~10TB single node; use Citus for horizontal |
|
||
| **MySQL / MariaDB** | Read-heavy apps, legacy systems, WordPress/Drupal ecosystem | Complex queries, full ACID at scale | ~10TB; use Vitess for sharding |
|
||
| **CockroachDB** | Global distributed SQL, geo-partitioning, multi-region | Simple single-region apps (overkill) | Petabyte-scale |
|
||
| **PlanetScale** | MySQL-compatible, serverless, branch-based workflow | Complex JOINs (foreign keys removed by design) | Very high — Vitess based |
|
||
| **Amazon Aurora** | AWS-native apps, managed PostgreSQL/MySQL, high availability | Non-AWS environments | Up to 128TB, 15 replicas |
|
||
|
||
### NoSQL
|
||
|
||
| Database | Best For | Avoid When | Scale Ceiling |
|
||
|----------|----------|------------|---------------|
|
||
| **MongoDB** | Flexible schema, document model, prototyping | Financial transactions requiring ACID | Petabyte-scale with sharding |
|
||
| **DynamoDB** | Key-value at massive scale, AWS-native, serverless, predictable latency | Complex queries, ad-hoc analytics, JOINs | Unlimited (AWS-managed) |
|
||
| **Cassandra** | Write-heavy, time-series, wide-column, geographically distributed | Read-heavy with complex queries | Petabyte-scale; used at Apple, Netflix |
|
||
| **Redis** | Cache, sessions, leaderboards, pub/sub, rate limiting | Primary data store for complex models | ~1TB per node; cluster for more |
|
||
| **Elasticsearch** | Full-text search, log aggregation, analytics | Primary database (durability risk) | Petabyte-scale with clusters |
|
||
| **InfluxDB** | Time-series metrics, IoT, monitoring data | General-purpose data | Very high write throughput |
|
||
| **Neo4j** | Graph data, social networks, recommendation engines, fraud detection | Non-graph data (overhead not worth it) | Billions of nodes |
|
||
|
||
### Decision Framework
|
||
|
||
```
|
||
Is your data relational (joins, foreign keys, transactions)?
|
||
YES → Start with PostgreSQL
|
||
NO → Continue below
|
||
|
||
Is your primary access pattern key-value?
|
||
YES, need extreme scale → DynamoDB or Cassandra
|
||
YES, need speed/cache → Redis
|
||
|
||
Is your data document-shaped (nested, flexible schema)?
|
||
YES → MongoDB
|
||
|
||
Is it time-series (metrics, logs, IoT)?
|
||
YES → InfluxDB or TimescaleDB
|
||
|
||
Is it graph (relationships are the data)?
|
||
YES → Neo4j
|
||
|
||
Is it search?
|
||
YES → Elasticsearch / OpenSearch
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Cache Selection
|
||
|
||
| Technology | Best For | Max Single Node | Cluster Support |
|
||
|------------|----------|----------------|----------------|
|
||
| **Redis** | Sessions, leaderboards, pub/sub, complex data structures, Lua scripting | ~1TB RAM | Yes (Redis Cluster, Redis Sentinel) |
|
||
| **Memcached** | Simple key-value, multi-threaded, large object cache | ~64GB RAM | Yes (client-side sharding) |
|
||
| **Varnish** | HTTP reverse proxy cache, full-page caching | RAM bound | Limited |
|
||
| **CloudFront / CDN** | Static assets, edge caching globally | N/A (distributed) | Built-in global distribution |
|
||
|
||
**Default recommendation: Redis** — more features, better ecosystem, active development.
|
||
|
||
Use **Memcached** only when: you need multi-threading for CPU-bound caching workloads and don't need data structures beyond string.
|
||
|
||
---
|
||
|
||
## 3. Message Queue / Event Streaming
|
||
|
||
| Technology | Model | Best For | Throughput | Retention |
|
||
|------------|-------|----------|------------|-----------|
|
||
| **Apache Kafka** | Log-based streaming | Event sourcing, high-throughput pipelines, replay, audit | Millions msg/s | Days to forever |
|
||
| **RabbitMQ** | AMQP message broker | Task queues, RPC, routing, fanout | 50K–100K msg/s | Until consumed |
|
||
| **AWS SQS** | Managed queue | AWS-native, simple task queue, serverless | Very high (managed) | Up to 14 days |
|
||
| **AWS SNS** | Pub/sub notification | Fan-out to many subscribers (email, SMS, Lambda, SQS) | Very high (managed) | No retention |
|
||
| **Google Pub/Sub** | Managed streaming | GCP-native, global, serverless | Very high (managed) | Up to 7 days |
|
||
| **Redis Pub/Sub** | In-memory pub/sub | Real-time notifications, low latency, fire-and-forget | Very high | None (no retention) |
|
||
| **NATS** | Lightweight messaging | IoT, microservices, low latency | Very high | JetStream adds retention |
|
||
|
||
### Decision Matrix
|
||
|
||
```
|
||
Need event replay / audit trail?
|
||
YES → Kafka or Kinesis
|
||
|
||
Need simple task queue with retries and DLQ?
|
||
AWS shop → SQS
|
||
Self-hosted → RabbitMQ
|
||
|
||
Need real-time pub/sub with no persistence?
|
||
Redis Pub/Sub or NATS
|
||
|
||
Need fan-out to multiple consumers?
|
||
Kafka (consumer groups) or SNS → SQS fan-out
|
||
|
||
Need < 5 minutes guaranteed delivery, AWS-native, zero ops?
|
||
SQS
|
||
|
||
Volume > 1 million messages/second?
|
||
Kafka (self-hosted) or Kinesis (managed)
|
||
```
|
||
|
||
---
|
||
|
||
## 4. API Protocol
|
||
|
||
| Protocol | Best For | Avoid When |
|
||
|----------|----------|------------|
|
||
| **REST (HTTP/JSON)** | Public APIs, CRUD, browser clients, simplicity | Strict typing required; high-performance internal services |
|
||
| **GraphQL** | Complex client data requirements, mobile (reduce over-fetching), BFF pattern | Simple CRUD; not worth the complexity |
|
||
| **gRPC (HTTP/2 + Protobuf)** | Internal microservice communication, low latency, strict contracts, streaming | Public browser APIs (needs gRPC-web) |
|
||
| **WebSocket** | Real-time bidirectional (chat, live dashboards, multiplayer games) | One-way server push (use SSE instead) |
|
||
| **SSE (Server-Sent Events)** | Server → client push (notifications, live feeds) | Bidirectional communication |
|
||
| **GraphQL Subscriptions** | Real-time with GraphQL schema consistency | Simple push scenarios |
|
||
|
||
**Default recommendation:**
|
||
- External / public: **REST**
|
||
- Internal service-to-service: **gRPC**
|
||
- Real-time features: **WebSocket** or **SSE**
|
||
|
||
---
|
||
|
||
## 5. Search Engine
|
||
|
||
| Technology | Best For | Avoid When |
|
||
|------------|----------|------------|
|
||
| **Elasticsearch** | Full-text search, log analytics (ELK), complex aggregations | Simple lookups; operational overhead is high |
|
||
| **OpenSearch** | AWS-native Elasticsearch alternative | Non-AWS preferred setups |
|
||
| **Typesense** | Simple, fast full-text search, typo tolerance, easy ops | Complex aggregations at massive scale |
|
||
| **Algolia** | Managed search-as-a-service, fast setup, great UI | High volume (expensive); self-hosted preference |
|
||
| **Meilisearch** | Self-hosted, developer-friendly, fast relevancy | Enterprise-scale analytics |
|
||
| **PostgreSQL FTS** | Basic full-text search, already using PostgreSQL | High relevancy requirements or large datasets |
|
||
|
||
**Rule of thumb:** Use PostgreSQL FTS under 1M documents. Move to Typesense or Elasticsearch above that.
|
||
|
||
---
|
||
|
||
## 6. Object Storage
|
||
|
||
| Service | Best For | Egress Cost |
|
||
|---------|----------|------------|
|
||
| **AWS S3** | AWS-native apps, de facto standard, massive ecosystem | $0.09/GB (expensive) |
|
||
| **Cloudflare R2** | S3-compatible, **zero egress cost**, global | $0.00 egress |
|
||
| **GCS** | GCP-native | $0.12/GB |
|
||
| **Azure Blob** | Azure-native | $0.087/GB |
|
||
| **Backblaze B2** | Cost-sensitive, S3-compatible | Free with Cloudflare |
|
||
| **MinIO** | Self-hosted S3-compatible | Self-managed |
|
||
|
||
**Cost optimization tip:** Use **Cloudflare R2** for user-facing media delivery (zero egress). Use **S3** for internal/AWS-integrated storage.
|
||
|
||
---
|
||
|
||
## 7. Container Orchestration
|
||
|
||
| Technology | Best For | Avoid When |
|
||
|------------|----------|------------|
|
||
| **Kubernetes (K8s)** | Large teams, complex deployments, multi-cloud, full control | Small teams (ops overhead is very high) |
|
||
| **AWS ECS + Fargate** | AWS-native, serverless containers, simpler than K8s | Multi-cloud or K8s ecosystem tools needed |
|
||
| **AWS EKS** | Managed K8s on AWS, best of both | Small teams; Fargate may be enough |
|
||
| **GKE (Google)** | Best managed K8s, GCP-native, Autopilot mode | Non-GCP environments |
|
||
| **Docker Compose** | Local dev, small single-server deployments | Production at any meaningful scale |
|
||
| **Nomad** | HashiCorp ecosystem, simpler than K8s, multi-workload | K8s ecosystem tools required |
|
||
|
||
**Startup default:** ECS + Fargate (zero cluster management).
|
||
**Scale default:** EKS or GKE once team > 5 engineers or services > 10.
|
||
|
||
---
|
||
|
||
## 8. Load Balancer
|
||
|
||
| Technology | Layer | Best For |
|
||
|------------|-------|----------|
|
||
| **AWS ALB** | L7 (HTTP/HTTPS) | AWS apps, path-based routing, WebSocket, HTTP/2 |
|
||
| **AWS NLB** | L4 (TCP/UDP) | Ultra-low latency, static IP, non-HTTP protocols |
|
||
| **GCP GLB** | L7 global | GCP apps, global anycast, single IP worldwide |
|
||
| **Nginx** | L4/L7 | Self-hosted, reverse proxy, flexible config |
|
||
| **HAProxy** | L4/L7 | High performance self-hosted, advanced routing |
|
||
| **Cloudflare** | L7 global + DDoS | DDoS protection + CDN + load balancing combined |
|
||
| **Traefik** | L7 | Kubernetes-native, automatic SSL, service discovery |
|
||
|
||
---
|
||
|
||
## 9. Observability Stack
|
||
|
||
### Metrics
|
||
| Tool | Best For |
|
||
|------|----------|
|
||
| **Prometheus + Grafana** | Self-hosted, open-source, Kubernetes-native |
|
||
| **Datadog** | Managed, APM + infra + logs unified, expensive |
|
||
| **CloudWatch** | AWS-native, zero setup, integrated with AWS services |
|
||
| **New Relic** | APM-focused, good for application-level insights |
|
||
|
||
### Logging
|
||
| Tool | Best For |
|
||
|------|----------|
|
||
| **ELK Stack** (Elasticsearch + Logstash + Kibana) | Self-hosted, powerful, high volume |
|
||
| **Loki + Grafana** | Lightweight, Kubernetes-native, cheap |
|
||
| **Splunk** | Enterprise, compliance, expensive |
|
||
| **AWS CloudWatch Logs** | AWS-native, zero setup |
|
||
| **Datadog Logs** | Unified with metrics, expensive |
|
||
|
||
### Distributed Tracing
|
||
| Tool | Best For |
|
||
|------|----------|
|
||
| **Jaeger** | Open-source, Kubernetes-native, OpenTelemetry |
|
||
| **Zipkin** | Simple, lightweight, good integrations |
|
||
| **AWS X-Ray** | AWS-native, integrates with Lambda, ECS |
|
||
| **Datadog APM** | Managed, unified with metrics and logs |
|
||
| **Honeycomb** | High-cardinality event-based observability |
|
||
|
||
**Recommended open-source stack:** Prometheus + Grafana + Loki + Jaeger (all integrate via OpenTelemetry)
|
||
**Recommended managed stack:** Datadog (expensive but unified) or Grafana Cloud
|
||
|
||
---
|
||
|
||
## 10. CDN
|
||
|
||
| Technology | Best For | Edge Locations |
|
||
|------------|----------|----------------|
|
||
| **Cloudflare** | DDoS protection + CDN + DNS, best free tier, edge workers | 300+ |
|
||
| **AWS CloudFront** | AWS-native, deep S3 and API GW integration | 450+ |
|
||
| **Akamai** | Enterprise, highest performance, expensive | 4000+ |
|
||
| **Fastly** | Real-time purging, streaming, VCL customization | 90+ |
|
||
| **Vercel Edge / Netlify** | Jamstack, frontend-first, zero config | 100+ |
|
||
|
||
**Default recommendation:** Cloudflare for most use cases (best value, DDoS included, free SSL, Workers for edge compute).
|
||
|
||
---
|
||
|
||
## Scale Benchmarks Quick Reference
|
||
|
||
| Technology | Write Throughput | Read Throughput | Notes |
|
||
|------------|-----------------|----------------|-------|
|
||
| PostgreSQL (single) | ~10K writes/s | ~50K reads/s | With connection pooling |
|
||
| PostgreSQL (replicas) | ~10K writes/s | ~200K reads/s | 4 replicas |
|
||
| MySQL (single) | ~15K writes/s | ~60K reads/s | |
|
||
| Cassandra | ~1M writes/s | ~500K reads/s | 10-node cluster |
|
||
| Redis | ~1M ops/s | ~1M ops/s | Single node in-memory |
|
||
| Kafka | ~1M msgs/s | ~1M msgs/s | Per partition |
|
||
| Elasticsearch | ~50K docs/s | ~10K queries/s | Per node |
|
||
| MongoDB | ~50K writes/s | ~100K reads/s | Per replica set |
|
||
|
||
*All benchmarks are approximate and depend heavily on hardware, payload size, and query complexity.*
|
||
|
||
|
||
## Limitations
|
||
- This is a reference document and may not cover all edge cases. Always verify architectures before production.
|