Skip to content

apollosolutions/apollo-subscriptions-at-scale

Repository files navigation

Apollo Subscriptions at Scale

A reference architecture demonstrating Apollo GraphQL Federated Subscriptions at scale using the HTTP Callback Protocol, with a React web app, multiple Router and subgraph pods, Kafka for event streaming, Redis for subscription state management, and a full observability stack.

The project contains two subgraphs that implement the HTTP Callback Protocol in different ways — a side-by-side comparison of both approaches:

Subgraph Approach Scalability
notifications ApolloServerPluginSubscriptionCallback — in-process state Single-pod (plugin holds state in memory)
orders Manual callback protocol — Redis-backed state Multi-pod (any pod can service any event)

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│  React Web App  (localhost:3000)                                        │
│  Apollo Client 4 · HTTP multipart subscriptions · no WebSocket         │
└─────────────────────────────┬──────────────────────────────────────────┘
                              │  HTTP multipart  (chunked streaming)
                              ▼
┌────────────────────────────────────────────────────────────────────────┐
│  nginx Load Balancer  (localhost:4000)                                  │
│  proxy_buffering off · proxy_read_timeout 3600s · resolver 127.0.0.11  │
└───────────────────┬──────────────────────────┬─────────────────────────┘
                    │  round-robin              │  round-robin
         ┌──────────▼──────────┐    ┌──────────▼──────────┐
         │  Apollo Router      │    │  Apollo Router       │
         │  router-0:4000      │    │  router-1:4000       │
         │  CALLBACK_PUBLIC_URL│    │  CALLBACK_PUBLIC_URL │
         │  http://router-0:   │    │  http://router-1:    │
         │  4000/callback      │    │  4000/callback       │
         └──────────┬──────────┘    └──────────┬───────────┘
                    └──────────┬───────────────┘
                               │  HTTP Callback Protocol
          ┌────────────────────┼────────────────────┐
          ▼                    ▼                     ▼
┌──────────────────┐  ┌─────────────────┐  ┌──────────────────┐
│  Notifications   │  │  Orders ×2      │  │  Users           │
│  (4001)          │  │  (4003)         │  │  (4002)          │
│                  │  │                 │  │                  │
│  Plugin approach │  │  Manual Redis   │  │  Query/Mutation  │
│  In-process state│  │  approach       │  │  only (no subs)  │
│  Single-pod only │  │  Stateless      │  │  Stateless       │
│                  │  │  Any pod →      │  │                  │
│  KafkaJS consumer│  │  any delivery   │  │                  │
└────────┬─────────┘  └────────┬────────┘  └──────────────────┘
         │                     │  KafkaJS consumer group
         │                     │  (3 partitions / 2 pods)
         └──────────┬──────────┘
                    ▼
┌────────────────────────────────────────────────────────────────────────┐
│  Apache Kafka 4.x  (KRaft mode, no ZooKeeper)                          │
│                                                                         │
│  notification-events   3 partitions  keyed by userId                   │
│  system-alerts         1 partition   keyed by userId                   │
│  order-status-changed  3 partitions  keyed by orderId                  │
│                        (same orderId → same partition → ordered events) │
└─────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────┐  ┌──────────────────────────┐
│  Redis 7.4                               │  │  Subscription Manager ×2  │
│                                          │  │                           │
│  substate:{id}   Hash  TTL 60s           │  │  Redis leader election:   │
│    callbackUrl                           │  │  SET sub:manager:lock     │
│    verifier                              │  │      NX EX 35             │
│    orderId                               │  │                           │
│    indexKey  ← full Redis key            │  │  Every 20s (leader only): │
│    subgraph  ← for observability         │  │  SCAN substate:*          │
│                                          │  │  → POST check to Router   │
│  subindex:{orderId}  Set  TTL 1h         │  │  → 200: EXPIRE 60s        │
│    {sub-id-1, sub-id-2, ...}             │  │  → 404: DEL + SREM        │
│                                          │  │                           │
│  sub:manager:lock  String  TTL 35s       │  │  Standby takes over       │
│    manager-<hostname>-<pid>              │  │  within 35s on crash      │
└──────────────────────────────────────────┘  └──────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  Observability                                                          │
│                                                                         │
│  Subgraphs + Router ──OTLP/HTTP:4318──▶ OTel Collector                 │
│                                              │ traces       │ metrics   │
│                                              ▼              ▼           │
│                                           Zipkin       Prometheus       │
│                                           :9411          :9090          │
│                                                            │             │
│                                                         Grafana          │
│                                                         :3001            │
└─────────────────────────────────────────────────────────────────────────┘

The core invariant

Callback URLs are pod-specific. When router-0 accepts a client subscription it registers http://router-0:4000/callback/{id} with the subgraph. That URL is stored in Redis. Any orders pod that consumes a Kafka event reads the URL from Redis and POSTs directly to router-0, bypassing nginx. Events always reach the right pod — the one holding the client's open HTTP response.


Quick Start

# 1. Copy environment template and fill in your Apollo Studio credentials
cp .env.example .env

# 2. Generate the supergraph schema (requires Rover CLI)
bash scripts/compose-supergraph.sh

# 3. Start the full scaled stack
docker compose -f docker-compose.scale.yml up -d --build \
  --scale orders=2 \
  --scale subscription-manager=2

# 4. Verify all checks pass
bash scripts/test-scale.sh
Service URL
Web App http://localhost:3000
Apollo Sandbox http://localhost:4000
Kafka UI http://localhost:8080
Zipkin (traces) http://localhost:9411
Prometheus http://localhost:9090
Grafana (admin/admin) http://localhost:3001

Version Matrix

Component Version
Apollo Federation v2.12
Apollo Router v2.11.x
Apollo Server v5.4.0
Apollo Client v4.x
Redis 7.4.x
Kafka 4.x (KRaft, official Apache image)
Node.js 22 (bookworm-slim)

See VERSION_MATRIX.md for all pinned dependency versions.


Project Structure

apollo-subscriptions-at-scale/
├── docker-compose.infra.yml       # Phase 1: Redis + Kafka only
├── docker-compose.yml             # Full single-pod stack (dev)
├── docker-compose.scale.yml       # Phase 6+: scaled stack (2 Routers, 2 Orders, 2 Managers)
├── supergraph-config.yaml         # Rover supergraph composition config
├── subgraphs/
│   ├── notifications/             # Plugin approach: ApolloServerPluginSubscriptionCallback
│   ├── orders/                    # Manual approach: Redis-backed callback protocol
│   └── users/                     # Query/Mutation only (federation entity resolution)
├── subscription-manager/          # Dedicated heartbeat service (Redis leader election)
├── router/                        # Apollo Router config (router.yaml + supergraph.graphql)
├── nginx/                         # nginx load balancer config
├── web-app/                       # React + Apollo Client 4 frontend
├── observability/                 # OTel Collector, Prometheus, Grafana configs
├── helm/                          # Kubernetes Helm charts
└── scripts/                       # Verification and utility scripts

Documentation

File Purpose
WALKTHROUGH.md Full system walkthrough — every component explained with diagrams, design decisions, and a complete request lifecycle trace
CLAUDE.md Architecture reference, critical config facts, key insights, testing guide
implementation-roadmap.md Phased implementation plan
subscription-at-scale-plan.md Original architecture design

License

MIT

About

Federated Subscriptions at scale: HTTP Callback Protocol

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors