--- name: tech-matrix description: Reference document for monopoly tech-matrix. risk: safe reports-to: monopoly --- # MONOPOLY — Technology Decision Matrix ## Table of Contents 1. Database Selection 2. Cache Selection 3. Message Queue / Event Streaming 4. API Protocol 5. Search Engine 6. Object Storage 7. Container Orchestration 8. Load Balancer 9. Observability Stack 10. CDN --- ## 1. Database Selection ### Relational (SQL) | Database | Best For | Avoid When | Scale Ceiling | |----------|----------|------------|---------------| | **PostgreSQL** | Complex queries, JSONB, GIS, strong consistency, most default use cases | Ultra-high write throughput (>100K writes/s) | ~10TB single node; use Citus for horizontal | | **MySQL / MariaDB** | Read-heavy apps, legacy systems, WordPress/Drupal ecosystem | Complex queries, full ACID at scale | ~10TB; use Vitess for sharding | | **CockroachDB** | Global distributed SQL, geo-partitioning, multi-region | Simple single-region apps (overkill) | Petabyte-scale | | **PlanetScale** | MySQL-compatible, serverless, branch-based workflow | Complex JOINs (foreign keys removed by design) | Very high — Vitess based | | **Amazon Aurora** | AWS-native apps, managed PostgreSQL/MySQL, high availability | Non-AWS environments | Up to 128TB, 15 replicas | ### NoSQL | Database | Best For | Avoid When | Scale Ceiling | |----------|----------|------------|---------------| | **MongoDB** | Flexible schema, document model, prototyping | Financial transactions requiring ACID | Petabyte-scale with sharding | | **DynamoDB** | Key-value at massive scale, AWS-native, serverless, predictable latency | Complex queries, ad-hoc analytics, JOINs | Unlimited (AWS-managed) | | **Cassandra** | Write-heavy, time-series, wide-column, geographically distributed | Read-heavy with complex queries | Petabyte-scale; used at Apple, Netflix | | **Redis** | Cache, sessions, leaderboards, pub/sub, rate limiting | Primary data store for complex models | ~1TB per node; cluster for more | | **Elasticsearch** | Full-text search, log aggregation, analytics | Primary database (durability risk) | Petabyte-scale with clusters | | **InfluxDB** | Time-series metrics, IoT, monitoring data | General-purpose data | Very high write throughput | | **Neo4j** | Graph data, social networks, recommendation engines, fraud detection | Non-graph data (overhead not worth it) | Billions of nodes | ### Decision Framework ``` Is your data relational (joins, foreign keys, transactions)? YES → Start with PostgreSQL NO → Continue below Is your primary access pattern key-value? YES, need extreme scale → DynamoDB or Cassandra YES, need speed/cache → Redis Is your data document-shaped (nested, flexible schema)? YES → MongoDB Is it time-series (metrics, logs, IoT)? YES → InfluxDB or TimescaleDB Is it graph (relationships are the data)? YES → Neo4j Is it search? YES → Elasticsearch / OpenSearch ``` --- ## 2. Cache Selection | Technology | Best For | Max Single Node | Cluster Support | |------------|----------|----------------|----------------| | **Redis** | Sessions, leaderboards, pub/sub, complex data structures, Lua scripting | ~1TB RAM | Yes (Redis Cluster, Redis Sentinel) | | **Memcached** | Simple key-value, multi-threaded, large object cache | ~64GB RAM | Yes (client-side sharding) | | **Varnish** | HTTP reverse proxy cache, full-page caching | RAM bound | Limited | | **CloudFront / CDN** | Static assets, edge caching globally | N/A (distributed) | Built-in global distribution | **Default recommendation: Redis** — more features, better ecosystem, active development. Use **Memcached** only when: you need multi-threading for CPU-bound caching workloads and don't need data structures beyond string. --- ## 3. Message Queue / Event Streaming | Technology | Model | Best For | Throughput | Retention | |------------|-------|----------|------------|-----------| | **Apache Kafka** | Log-based streaming | Event sourcing, high-throughput pipelines, replay, audit | Millions msg/s | Days to forever | | **RabbitMQ** | AMQP message broker | Task queues, RPC, routing, fanout | 50K–100K msg/s | Until consumed | | **AWS SQS** | Managed queue | AWS-native, simple task queue, serverless | Very high (managed) | Up to 14 days | | **AWS SNS** | Pub/sub notification | Fan-out to many subscribers (email, SMS, Lambda, SQS) | Very high (managed) | No retention | | **Google Pub/Sub** | Managed streaming | GCP-native, global, serverless | Very high (managed) | Up to 7 days | | **Redis Pub/Sub** | In-memory pub/sub | Real-time notifications, low latency, fire-and-forget | Very high | None (no retention) | | **NATS** | Lightweight messaging | IoT, microservices, low latency | Very high | JetStream adds retention | ### Decision Matrix ``` Need event replay / audit trail? YES → Kafka or Kinesis Need simple task queue with retries and DLQ? AWS shop → SQS Self-hosted → RabbitMQ Need real-time pub/sub with no persistence? Redis Pub/Sub or NATS Need fan-out to multiple consumers? Kafka (consumer groups) or SNS → SQS fan-out Need < 5 minutes guaranteed delivery, AWS-native, zero ops? SQS Volume > 1 million messages/second? Kafka (self-hosted) or Kinesis (managed) ``` --- ## 4. API Protocol | Protocol | Best For | Avoid When | |----------|----------|------------| | **REST (HTTP/JSON)** | Public APIs, CRUD, browser clients, simplicity | Strict typing required; high-performance internal services | | **GraphQL** | Complex client data requirements, mobile (reduce over-fetching), BFF pattern | Simple CRUD; not worth the complexity | | **gRPC (HTTP/2 + Protobuf)** | Internal microservice communication, low latency, strict contracts, streaming | Public browser APIs (needs gRPC-web) | | **WebSocket** | Real-time bidirectional (chat, live dashboards, multiplayer games) | One-way server push (use SSE instead) | | **SSE (Server-Sent Events)** | Server → client push (notifications, live feeds) | Bidirectional communication | | **GraphQL Subscriptions** | Real-time with GraphQL schema consistency | Simple push scenarios | **Default recommendation:** - External / public: **REST** - Internal service-to-service: **gRPC** - Real-time features: **WebSocket** or **SSE** --- ## 5. Search Engine | Technology | Best For | Avoid When | |------------|----------|------------| | **Elasticsearch** | Full-text search, log analytics (ELK), complex aggregations | Simple lookups; operational overhead is high | | **OpenSearch** | AWS-native Elasticsearch alternative | Non-AWS preferred setups | | **Typesense** | Simple, fast full-text search, typo tolerance, easy ops | Complex aggregations at massive scale | | **Algolia** | Managed search-as-a-service, fast setup, great UI | High volume (expensive); self-hosted preference | | **Meilisearch** | Self-hosted, developer-friendly, fast relevancy | Enterprise-scale analytics | | **PostgreSQL FTS** | Basic full-text search, already using PostgreSQL | High relevancy requirements or large datasets | **Rule of thumb:** Use PostgreSQL FTS under 1M documents. Move to Typesense or Elasticsearch above that. --- ## 6. Object Storage | Service | Best For | Egress Cost | |---------|----------|------------| | **AWS S3** | AWS-native apps, de facto standard, massive ecosystem | $0.09/GB (expensive) | | **Cloudflare R2** | S3-compatible, **zero egress cost**, global | $0.00 egress | | **GCS** | GCP-native | $0.12/GB | | **Azure Blob** | Azure-native | $0.087/GB | | **Backblaze B2** | Cost-sensitive, S3-compatible | Free with Cloudflare | | **MinIO** | Self-hosted S3-compatible | Self-managed | **Cost optimization tip:** Use **Cloudflare R2** for user-facing media delivery (zero egress). Use **S3** for internal/AWS-integrated storage. --- ## 7. Container Orchestration | Technology | Best For | Avoid When | |------------|----------|------------| | **Kubernetes (K8s)** | Large teams, complex deployments, multi-cloud, full control | Small teams (ops overhead is very high) | | **AWS ECS + Fargate** | AWS-native, serverless containers, simpler than K8s | Multi-cloud or K8s ecosystem tools needed | | **AWS EKS** | Managed K8s on AWS, best of both | Small teams; Fargate may be enough | | **GKE (Google)** | Best managed K8s, GCP-native, Autopilot mode | Non-GCP environments | | **Docker Compose** | Local dev, small single-server deployments | Production at any meaningful scale | | **Nomad** | HashiCorp ecosystem, simpler than K8s, multi-workload | K8s ecosystem tools required | **Startup default:** ECS + Fargate (zero cluster management). **Scale default:** EKS or GKE once team > 5 engineers or services > 10. --- ## 8. Load Balancer | Technology | Layer | Best For | |------------|-------|----------| | **AWS ALB** | L7 (HTTP/HTTPS) | AWS apps, path-based routing, WebSocket, HTTP/2 | | **AWS NLB** | L4 (TCP/UDP) | Ultra-low latency, static IP, non-HTTP protocols | | **GCP GLB** | L7 global | GCP apps, global anycast, single IP worldwide | | **Nginx** | L4/L7 | Self-hosted, reverse proxy, flexible config | | **HAProxy** | L4/L7 | High performance self-hosted, advanced routing | | **Cloudflare** | L7 global + DDoS | DDoS protection + CDN + load balancing combined | | **Traefik** | L7 | Kubernetes-native, automatic SSL, service discovery | --- ## 9. Observability Stack ### Metrics | Tool | Best For | |------|----------| | **Prometheus + Grafana** | Self-hosted, open-source, Kubernetes-native | | **Datadog** | Managed, APM + infra + logs unified, expensive | | **CloudWatch** | AWS-native, zero setup, integrated with AWS services | | **New Relic** | APM-focused, good for application-level insights | ### Logging | Tool | Best For | |------|----------| | **ELK Stack** (Elasticsearch + Logstash + Kibana) | Self-hosted, powerful, high volume | | **Loki + Grafana** | Lightweight, Kubernetes-native, cheap | | **Splunk** | Enterprise, compliance, expensive | | **AWS CloudWatch Logs** | AWS-native, zero setup | | **Datadog Logs** | Unified with metrics, expensive | ### Distributed Tracing | Tool | Best For | |------|----------| | **Jaeger** | Open-source, Kubernetes-native, OpenTelemetry | | **Zipkin** | Simple, lightweight, good integrations | | **AWS X-Ray** | AWS-native, integrates with Lambda, ECS | | **Datadog APM** | Managed, unified with metrics and logs | | **Honeycomb** | High-cardinality event-based observability | **Recommended open-source stack:** Prometheus + Grafana + Loki + Jaeger (all integrate via OpenTelemetry) **Recommended managed stack:** Datadog (expensive but unified) or Grafana Cloud --- ## 10. CDN | Technology | Best For | Edge Locations | |------------|----------|----------------| | **Cloudflare** | DDoS protection + CDN + DNS, best free tier, edge workers | 300+ | | **AWS CloudFront** | AWS-native, deep S3 and API GW integration | 450+ | | **Akamai** | Enterprise, highest performance, expensive | 4000+ | | **Fastly** | Real-time purging, streaming, VCL customization | 90+ | | **Vercel Edge / Netlify** | Jamstack, frontend-first, zero config | 100+ | **Default recommendation:** Cloudflare for most use cases (best value, DDoS included, free SSL, Workers for edge compute). --- ## Scale Benchmarks Quick Reference | Technology | Write Throughput | Read Throughput | Notes | |------------|-----------------|----------------|-------| | PostgreSQL (single) | ~10K writes/s | ~50K reads/s | With connection pooling | | PostgreSQL (replicas) | ~10K writes/s | ~200K reads/s | 4 replicas | | MySQL (single) | ~15K writes/s | ~60K reads/s | | | Cassandra | ~1M writes/s | ~500K reads/s | 10-node cluster | | Redis | ~1M ops/s | ~1M ops/s | Single node in-memory | | Kafka | ~1M msgs/s | ~1M msgs/s | Per partition | | Elasticsearch | ~50K docs/s | ~10K queries/s | Per node | | MongoDB | ~50K writes/s | ~100K reads/s | Per replica set | *All benchmarks are approximate and depend heavily on hardware, payload size, and query complexity.* ## Limitations - This is a reference document and may not cover all edge cases. Always verify architectures before production.