-
Notifications
You must be signed in to change notification settings - Fork 0
Level 3 Architectural Context & Learning Guide
Purpose: This document explains the "why" behind Level 3 architectural decisions. It's educational material for understanding trade-offs and design patterns.
Note: This is reference material for human learning, not required for AI implementation. The actual step-by-step implementation plans are in LEVEL_3_MVP_PLAN.md and LEVEL_3_PRODUCTION_PLAN.md.
In a production queue-based system, a common but subtle bug occurs when:
Timeline (milliseconds):
0ms: Client POST /book → API creates job → API returns 202 + jobId
10ms: Worker picks up job → Processes in 10ms → Publishes "completed" event
50ms: Client receives 202 response → Starts establishing SSE connection
60ms: Client connects to GET /api/v1/bookings/status/:jobId
→ Subscribes to Redis channel for updates
→ BUG: The "completed" event was published at 10ms, before subscription at 60ms
Client waits forever, never receives the result
- Client hangs indefinitely waiting for status that already happened
- User sees infinite loading spinner
- System resources wasted on dead connections
- Under load, this creates cascading timeout issues
- Debugging is difficult: worker logs show success, but client never receives it
1. Redis Pub/Sub with Simple Subscription:
// ❌ BROKEN: Subscribes after worker may have finished
app.get('/status/:jobId', (req, res) => {
// Worker might have already published here!
const subscriber = redis.createSubscriber();
subscriber.subscribe('booking-events');
subscriber.on('message', (message) => {
if (message.jobId === req.params.jobId) {
res.write(`event: ${message.type}\n`);
}
});
// What if worker published before we subscribed?
// → Client waits forever
});2. Message Persistence (Workaround):
- Could persist all events to Redis/DB
- Client connects → queries history → subscribes
- But: Adds storage overhead, complexity, and latency
- Not suitable for high-frequency events like booking status updates
// ✅ ROBUST: Check state first, then subscribe only if needed
app.get('/status/:jobId', async (req, res) => {
// 1. Check current job state IMMEDIATELY
const job = await bookingQueue.getJob(req.params.jobId);
if (job.returnvalue) {
// Worker already finished! Send result immediately
res.write(`event: confirmed\ndata: ${JSON.stringify(job.returnvalue)}\n\n`);
return res.end();
}
if (job.failedReason) {
// Already failed! Send failure immediately
res.write(`event: failed\ndata: ${JSON.stringify(job.failedReason)}\n\n`);
return res.end();
}
// 2. ONLY subscribe if job is still active
const queueEvents = new QueueEvents('booking');
queueEvents.on('completed', (event) => {
if (event.jobId === req.params.jobId) {
res.write(`event: confirmed\n`);
res.end();
}
});
// Send current status
res.write(`event: ${job.status}\n`);
});- ✅ Client always receives final status (even if late)
- ✅ No hanging connections
- ✅ No message persistence overhead
- ✅ Works reliably from 10ms to 10s processing times
- ✅ Handles network delays, slow clients, and retries
When you have multiple API instances behind a load balancer:
Load Balancer
├─→ API Instance 1 (holds Client A's SSE connection)
├─→ API Instance 2 (holds Client B's SSE connection)
└─→ API Instance 3 (holds Client C's SSE connection)
Worker completes job for Client A:
→ Publishes to Redis channel: "job:123 completed"
→ Which instance should forward to Client A?
→ How do instances know which connections they hold?
// ❌ COMPLEX: Each instance must track its own connections
const activeConnections = new Map(); // jobId → res
// Instance 1
globalRedisSubscriber.on('message', (channel, message) => {
// Instance 1, 2, and 3 all receive this message!
const event = JSON.parse(message);
// But maybe only Instance 2 has the connection for this jobId
const clientConnection = activeConnections.get(event.jobId);
if (clientConnection) {
// Instance 2 forwards to client
clientConnection.write(`event: ${event.type}...`);
}
// Instances 1 & 3 ignore (no connection found)
});
// Problem 1: Connection tracking across instances is manual
// Problem 2: Race condition on instance restart (lost connections)
// Problem 3: Memory leaks if connections not cleaned up
// Problem 4: Each instance processes messages for ALL jobs (inefficient)1. Connection State Synchronization:
- Each API instance must maintain a map:
jobId → clientResponse - No built-in way to sync this state across instances
- If Instance 1 crashes, all its SSE connections are lost
- Clients must reconnect, but may hit different instance → state lost
2. Message Filtering Overhead:
- Every instance receives EVERY job completion event
- Instance 3 processes events for jobs it has no connection for
- Under 10K concurrent bookings → 10K events × 3 instances = 30K messages
- Wastes CPU and network bandwidth
3. Connection Cleanup Complexity:
// Need to handle:
- Client disconnects
- Instance crashes
- Network timeouts
- Process restarts
// All must clean up connection state, or memory leaks4. Deployment Challenges:
- Rolling deployments: Old instances shutting down, new ones starting
- How to migrate SSE connections without dropping updates?
- Need custom connection migration logic
// ✅ SIMPLE: QueueEvents handles scaling automatically
import { QueueEvents } from 'bullmq';
// Each API instance creates its own QueueEvents listener
const queueEvents = new QueueEvents('booking');
queueEvents.on('completed', ({ jobId, returnvalue }) => {
// This callback runs in EVERY instance
// But we only have the connection object in ONE instance
const clientConnection = activeConnections.get(jobId);
if (clientConnection) {
// Only the instance that holds the connection sends the response
clientConnection.write(`event: confirmed\n`);
}
// Other instances just ignore (no connection found)
});1. Built-in Pub/Sub:
- Uses Redis Streams (not Pub/Sub) for reliable event delivery
- Events persisted temporarily in Redis
- Instances can reconnect and catch up on missed events
- No manual channel management
2. Efficient Event Routing:
- Events are lightweight:
{ jobId, status, returnvalue } - No custom serialization needed
- BullMQ manages the Redis keys and channels automatically
3. Connection State Remains Local:
// ✅ SIMPLE: Each instance only tracks its own connections
const activeConnections = new Map(); // Only this instance's connections
// No need to sync with other instances
// No need for distributed state management
// Each instance is independent4. Horizontal Scaling Benefits:
# Scale API to 5 instances
docker compose up -d --scale api=5
# Each instance:
# - Receives all events via QueueEvents
# - Checks local connection map
# - Forwards only if connection exists
# - No coordination needed between instances
# Result: Linear scaling, no shared stateBefore (Raw Redis Pub/Sub):
- 3 API instances
- 10,000 concurrent booking requests
- Each job completion = 3 messages (one per instance)
- Total: 10,000 jobs × 3 instances = 30,000 messages
- Each instance processes 30,000 messages (most ignored)
→ 66% wasted CPU, complex connection tracking
After (BullMQ QueueEvents):
- 3 API instances
- 10,000 concurrent booking requests
- Each job completion = 1 lightweight event
- Each instance checks local map, only 1 instance forwards
→ 0% wasted CPU, simple local state
- Purpose-Built: Designed specifically for BullMQ job lifecycle events
-
Type-Safe: Events typed as
{ jobId: string, status: string, ... } - Reliable: Uses Redis Streams, not Pub/Sub (persistent vs ephemeral)
- Efficient: Minimal overhead, no manual serialization
- Battle-Tested: Used in production at scale by many companies
- Documented: Official BullMQ documentation and examples
Never for this use case. QueueEvents is superior in every way for job status notifications.
The only time to use raw Redis Pub/Sub is for non-BullMQ events (e.g., custom notifications, chat messages) where you need custom channel management.
- BullMQ: Actively maintained, TypeScript support, better performance
- Bull: Legacy, fewer features, slower development
- Decision: Use BullMQ for modern architecture
- BullMQ Requirement: BullMQ is built specifically on ioredis and requires it for connection handling
- node-redis: Officially recommended by Redis for new projects, but NOT compatible with BullMQ
- Decision: Use ioredis because BullMQ mandates it - no choice if we want BullMQ
- Separation: API and Worker as separate apps
- Shared code: Database, types, utilities in packages
- Independent scaling: Deploy API and Worker separately
- Clear boundaries: Enforces clean architecture
- Decision: Reorganized from monolith to monorepo (Milestone 0)
- SSE: HTTP-based, simpler implementation, auto-reconnection
- WebSockets: Bidirectional (not needed), more complex
- Decision: SSE fits unidirectional server→client updates, scales better
- Optimistic: Higher throughput, scales better, simpler
- Distributed Locks: Redlock complexity, lower performance
- Decision: Optimistic locking sufficient for booking tickets
- Runtime Validation: Enforces contract between API and Worker
- Type Safety: Single source of truth for job data structure
- Prevents Silent Failures: Invalid data caught at boundaries
-
Decision: Zod schemas in
packages/typesvalidate producer and consumer
- Max Attempts: 3 tries for version conflicts
- Backoff: Exponential (100ms → 200ms → 400ms)
- Rationale: Balance between success rate and latency
- Decision: If Redis unavailable, return 503. No fallback to Level 2.
-
Rationale:
- Level 2 synchronous transactions would overload database if Redis fails under load
- Fallback logic adds complexity and new failure modes
- Monitoring Redis health separately is more reliable
- Forces infrastructure reliability for Redis
- Client Handling: Return 503 → client retries with exponential backoff
- Alternative: Use circuit breaker to return 503 immediately, not hang
| Scenario | Worker Behavior | API Response | Client SSE Event |
|---|---|---|---|
| Valid booking | Success | 202 Accepted | confirmed |
| Event sold out | Fail (no retry) | 202 Accepted | failed (409) |
| Invalid eventId | Fail | 202 Accepted | failed (404) |
| Version conflict | Retry (max env.WORKER_MAX_RETRIES) | 202 Accepted | confirmed |
| Worker crash | Job re-queued | 202 Accepted | processing → confirmed |
| Rate limit exceeded | N/A | 429 Too Many Requests | N/A |
| Queue depth exceeded | N/A | 503 Queue Full | N/A |
| Redis down | Cannot queue job | 503 Service Unavailable (circuit open) | N/A |
IMPORTANT - No Graceful Degradation:
- If Redis is down, API returns 503 immediately via circuit breaker
- No fallback to Level 2 synchronous transactions
- Reason: Prevents database overload during Redis failures
- Client responsibility: Retry with exponential backoff
- Monitor Redis health separately
| Metric | Level 2 | Level 3 MVP | Level 3 Production |
|---|---|---|---|
| API Response | 800-1500ms | <100ms | <100ms |
| Throughput | 200-300 req/s | 2000-5000 req/s | 5000-10000 req/s |
| Timeout Rate | 1-2% | 0% | 0% |
| Scalability | Vertical only | Horizontal (workers) | Horizontal (API + workers) |
| Race Conditions | 0 (pessimistic) | 0 (optimistic) | 0 (optimistic) |
| Real-Time Updates | N/A | Polling | SSE |
| Rate Limiting | No | No | Yes (10 req/min) |
| Circuit Breaker | No | No | Yes |
| Monitoring | Logs only | Logs only | Dashboard + Logs |
| Metric | Level 2 | Level 3 Target | Validation Method |
|---|---|---|---|
| Throughput (req/s) | 200-300 | 5,000-10,000 | Load test (10K req) |
| API Response Time | 800-1500ms | <100ms | Response timing |
| Timeout Rate | 1-2% | 0% | Error counting |
| Race Conditions | 0 | 0 | Database verification |
| Worker Processing | N/A | 200-500ms avg | Job logs |
| Data Integrity | ✅ | ✅ | Booking count vs tickets |
| Scalability | Vertical | Horizontal | Multiple workers |
| Queue Depth | N/A | <50 avg | BullMQ dashboard |
| Job Failure Rate | N/A | <0.1% | BullMQ metrics |
| Version Conflicts | N/A | <5% | Retry count |
| SSE Reliability | N/A | 100% | State check tests |
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Redis becomes bottleneck | Medium | High | Add Redis clustering, increase workers |
| Worker crashes lose jobs | Low | Medium | BullMQ persistence, automatic retry |
| SSE connections overload | Low | Medium | Connection pooling, TTL on jobs |
| Version conflicts too high | Low | Medium | Tune retry strategy, measure conflict rate |
| Increased complexity | High | Medium | Comprehensive testing, documentation |
| Monorepo build issues | Low | Low | Use Turborepo, clear package boundaries |
| BullMQ dashboard security | Medium | High | Add auth middleware to /admin/queues |
| "Fast Worker" race condition | High | High | State check pattern (Milestone 7/8) |
| Type mismatches API→Worker | Medium | High | Zod schema validation (Milestone 3) |
| Redis failure causes downtime | Low | High | Hard fail (503) + monitoring + alerts |
- Skipping Milestone 0: Don't add Redis to monolithic structure. Restructure first.
- Wrong milestone order: Implement optimistic locking (M5) BEFORE API migration (M6)
- Starting workers before M5: Workers will fail to process jobs
- Migrating API before workers ready: Jobs created but not processed
- Skipping Zod validation: Type mismatches cause silent failures
- Not testing version conflicts: Optimistic locking bugs only appear under load
- Forgetting to remove Level 2 code: Confusion about which logic is active
- Raw Redis Pub/Sub: Use BullMQ QueueEvents for horizontal scaling
- No SSE state check: "Fast Worker" race condition causes lost updates
- Exposing dashboard publicly: Major security risk
- No rate limiting: Queue overflow and abuse vectors
- Implementing graceful degradation: Hard fail is safer and simpler for Level 3
- Not testing Redis failures: Circuit breaker untested until production incident
- Polling instead of SSE: Wastes resources, poor user experience
- Hardcoded limits: Production tuning requires code changes
- No cleanup on disconnect: Memory leaks from orphaned SSE connections
Use this template when documenting Level 3 implementations:
/**
* Level 3 Implementation: Async Queue-Based Processing with SSE
*
* Architecture: /apps/api, /apps/worker, /packages/*
*
* How it works:
* 1. API receives POST /book, validates with Zod → returns 202 + jobId
* 2. BullMQ stores job in Redis, dashboard tracks status
* 3. Worker pulls job, validates with Zod, uses optimistic locking
* 4. Version conflict → retry (max 3, exponential backoff)
* 5. Success/failure published via QueueEvents → SSE notifies client
* 6. Fast worker handled: state check before subscribe
*
* Trade-offs:
* ✅ Pros: <100ms API, 10K+ req/s, horizontal scaling, zero timeouts
* ⚠️ Cons: Eventual consistency, added complexity, Redis dependency
*
* Why monorepo: Clean separation, independent scaling, shared packages
* Why Zod schemas: Runtime validation prevents API-Worker contract breaches
* Why QueueEvents: Built-in horizontal scaling, replaces raw Redis Pub/Sub
* Why no degradation: Hard fail safer than database overload fallback
* Why BullMQ dashboard: Real-time queue monitoring essential at scale
*
* Service Boundary: Zod schema is the contract. Always validate.
*/When discussing this project in interviews, emphasize:
- Problem: High-concurrency ticket booking with race conditions
- Solution Evolution: Pessimistic locking (L2) → Async queue + Optimistic locking (L3)
- Trade-offs Understood: Latency vs throughput, consistency models, complexity vs scalability
- Monorepo: Separation of concerns, independent scaling
- BullMQ QueueEvents: Horizontal scaling without custom Pub/Sub complexity
- SSE over WebSockets: Simpler for unidirectional updates
- Hard Fail over Graceful Degradation: Predictable failure modes
- "Fast Worker" Race: State check before subscribe (shows attention to detail)
- Circuit Breaker: Fail fast, protect dependencies
- Rate Limiting: Prevent abuse and overload
- 10K Load Testing: Validated under extreme conditions
- Zero Data Integrity Issues: No overbookings across all tests
- Monitoring: Separate dashboard service with security considerations
- Documentation: Comprehensive plans and architectural context
This document is for learning and reference. For implementation, see:
- LEVEL_3_MVP_PLAN.md - Step-by-step MVP implementation
- LEVEL_3_PRODUCTION_PLAN.md - Production hardening steps
- LEVEL_3_COMPLETE_PLAN.md - Combined reference (full context)