Skip to content

Commit 1c91000

Browse files
jwilgerclaude
andauthored
Add 9 critical architecture decision records (#93)
## Summary - Adds 9 new ADRs documenting critical architectural decisions identified from analyzing the 91 GitHub issues - These ADRs cover performance-critical decisions like dual-path architecture, ring buffer design, and tiered projections - Also documents privacy, testing, caching, and monitoring architectures ## New ADRs Added 1. **ADR-0008: Dual-path Architecture** - Separates hot path (<5ms) from audit path for minimal latency 2. **ADR-0009: Ring Buffer Pattern** - Lock-free event handoff with <1μs overhead 3. **ADR-0010: Tiered Projection Strategy** - In-memory, PostgreSQL, and Elasticsearch tiers 4. **ADR-0011: Provider Abstraction and Routing** - URL-based routing for multiple LLM providers 5. **ADR-0012: Session Identification and Metadata** - Custom headers for session tracking 6. **ADR-0013: Test Execution Architecture** - Pluggable evaluation strategies 7. **ADR-0014: Privacy and Compliance Architecture** - GDPR, HIPAA, PII handling 8. **ADR-0015: Caching Strategy** - Optional response caching with minimal overhead 9. **ADR-0016: Performance Monitoring and Metrics** - Zero-allocation hot path metrics ## Context After creating 91 GitHub issues from the ISSUES_DRAFT.md document, I analyzed the issues to identify architectural decisions that weren't yet documented. These 9 ADRs capture the most critical design decisions for Union Square's architecture. ## Test plan - [x] All ADRs follow the established template format - [x] Content is technically accurate and comprehensive - [x] ADRs reference related decisions appropriately - [x] Files are properly formatted 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 03ca0d2 commit 1c91000

10 files changed

+1992
-1
lines changed

adr/0007-eventcore-as-central-audit-mechanism.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# EventCore as Central Audit Mechanism
1+
# ADR-0007: EventCore as Central Audit Mechanism
22

33
- Status: proposed
44
- Deciders: John Wilger, Claude

adr/0008-dual-path-architecture.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# ADR-0008: Dual-path Architecture (Hot Path vs Audit Path)
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Context
8+
9+
Union Square acts as a proxy between applications and LLM providers, requiring us to:
10+
1. Forward requests to LLM providers with minimal latency (<5ms overhead)
11+
2. Capture comprehensive audit data for every request/response
12+
3. Provide analytics, testing, and debugging capabilities
13+
4. Never become a single point of failure
14+
15+
These requirements create competing concerns:
16+
- Fast forwarding requires minimal processing
17+
- Comprehensive capture requires significant processing
18+
- Reliability requires the proxy to be optional
19+
20+
## Decision
21+
22+
We will implement a dual-path architecture that separates concerns:
23+
24+
1. **Hot Path (Request Forwarding)**
25+
- Receives incoming LLM API requests
26+
- Performs minimal validation
27+
- Immediately forwards to the appropriate provider
28+
- Captures raw request/response data in a ring buffer
29+
- Returns the provider's response to the caller
30+
- Target: <5ms overhead, <1μs for ring buffer write
31+
32+
2. **Audit Path (Async Processing)**
33+
- Reads from the ring buffer asynchronously
34+
- Processes captured data into EventCore events
35+
- Handles all non-critical operations:
36+
- Session tracking
37+
- Analytics calculation
38+
- Test case extraction
39+
- Privacy compliance (PII detection)
40+
- Cost tracking
41+
- Error analysis
42+
43+
### Data Flow
44+
45+
```
46+
Client Request → Hot Path → LLM Provider
47+
↓ ↓ ↓
48+
Ring Buffer ← Raw Data ← Provider Response
49+
↓ ↓
50+
Audit Path Client Response
51+
52+
EventCore Events
53+
```
54+
55+
### Key Design Principles
56+
57+
1. **Fire and Forget**: Hot path never waits for audit path
58+
2. **Graceful Degradation**: If ring buffer is full, drop audit data rather than block
59+
3. **Eventual Consistency**: Analytics and session data are eventually consistent
60+
4. **Bypass Capability**: Clients can fallback to direct provider connections
61+
62+
## Consequences
63+
64+
### Positive
65+
66+
- Minimal latency impact on LLM API calls
67+
- System remains responsive even under heavy audit load
68+
- Can scale hot path and audit path independently
69+
- Audit processing can be paused/resumed without affecting traffic
70+
- Supports complex processing without impacting response times
71+
72+
### Negative
73+
74+
- Audit data may be lost if ring buffer overflows
75+
- Debugging requires correlating across two paths
76+
- Real-time analytics have eventual consistency delays
77+
- Additional complexity in deployment and monitoring
78+
- Requires careful capacity planning for ring buffer
79+
80+
### Mitigation Strategies
81+
82+
1. **Ring Buffer Monitoring**: Alert on high watermarks before overflow
83+
2. **Backpressure**: Slow audit processing triggers capacity scaling
84+
3. **Correlation IDs**: Unique IDs link hot path and audit path data
85+
4. **Health Checks**: Monitor both paths independently
86+
5. **Replay Capability**: Store raw data for replay if audit processing fails
87+
88+
## Alternatives Considered
89+
90+
1. **Single Path Processing**
91+
- Process everything inline
92+
- Rejected: Would violate <5ms latency requirement
93+
94+
2. **Queue-based Separation**
95+
- Use message queue between paths
96+
- Rejected: Adds external dependency and latency
97+
98+
3. **Sidecar Pattern**
99+
- Run audit processing as separate process
100+
- Rejected: More complex deployment, harder to maintain <1μs handoff
101+
102+
4. **Database Write-through**
103+
- Write directly to database from hot path
104+
- Rejected: Database writes would exceed latency budget
105+
106+
## Related Decisions
107+
108+
- ADR-0007: EventCore as Central Audit Mechanism (audit path output)
109+
- ADR-0009: Ring Buffer Pattern for Event Recording (handoff mechanism)
110+
- ADR-0010: Tiered Projection Strategy (audit path processing)

adr/0009-ring-buffer-pattern.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# ADR-0009: Ring Buffer Pattern for Event Recording
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Context
8+
9+
The dual-path architecture (ADR-0008) requires a mechanism to hand off data from the hot path to the audit path with minimal overhead. Our requirements are:
10+
11+
1. <1 microsecond write latency from hot path
12+
2. No memory allocations in hot path
13+
3. No blocking operations
14+
4. Handle variable-sized LLM request/response payloads
15+
5. Graceful handling of buffer overflow
16+
6. Support concurrent writers (multiple request threads)
17+
18+
Traditional approaches like queues, channels, or direct database writes would violate our latency requirements.
19+
20+
## Decision
21+
22+
We will implement a lock-free ring buffer specifically designed for our use case:
23+
24+
### Design
25+
26+
1. **Fixed-size Ring Buffer**
27+
- Pre-allocated memory pool (configurable, default 1GB)
28+
- Divided into fixed-size slots (default 64KB per slot)
29+
- Power-of-2 slot count for efficient modulo operations
30+
31+
2. **Lock-free Multi-Producer Single-Consumer (MPSC)**
32+
- Multiple hot path threads write concurrently
33+
- Single audit path thread reads sequentially
34+
- Uses atomic operations for coordination
35+
36+
3. **Slot Structure**
37+
```rust
38+
struct Slot {
39+
state: AtomicU8, // EMPTY, WRITING, READY, READING
40+
size: AtomicU32, // Actual payload size
41+
timestamp: u64, // Capture timestamp
42+
request_id: Uuid, // Correlation ID
43+
data: [u8; SLOT_SIZE] // Raw payload
44+
}
45+
```
46+
47+
4. **Write Algorithm**
48+
```
49+
1. Atomically claim next write position
50+
2. If slot not EMPTY, increment overflow counter and return
51+
3. CAS slot state to WRITING
52+
4. Copy data (with size limit)
53+
5. Store size and metadata
54+
6. CAS slot state to READY
55+
```
56+
57+
5. **Read Algorithm**
58+
```
59+
1. Check slot at read position
60+
2. If READY, CAS to READING
61+
3. Process data
62+
4. CAS to EMPTY
63+
5. Advance read position
64+
```
65+
66+
### Overflow Handling
67+
68+
- Large payloads are truncated to slot size
69+
- Truncation flag is set in metadata
70+
- Full payloads can be requested via separate async path
71+
- Overflow metrics are tracked for capacity planning
72+
73+
### Memory Layout
74+
75+
```
76+
[Header (4KB)]
77+
[Slot 0 (64KB)] [Slot 1 (64KB)] ... [Slot N (64KB)]
78+
```
79+
80+
Header contains:
81+
- Write position (atomic)
82+
- Read position (atomic)
83+
- Overflow counter (atomic)
84+
- Configuration parameters
85+
86+
## Consequences
87+
88+
### Positive
89+
90+
- Predictable <1μs write latency
91+
- No memory allocations after initialization
92+
- No system calls in hot path
93+
- CPU cache-friendly sequential access
94+
- Graceful degradation under load
95+
- Simple crash recovery (can resume from read position)
96+
97+
### Negative
98+
99+
- Fixed memory overhead (1GB default)
100+
- Large requests/responses need truncation
101+
- Lost data on overflow (by design)
102+
- Single audit reader constraint
103+
- Complex testing of concurrent scenarios
104+
105+
### Mitigation Strategies
106+
107+
1. **Dynamic Sizing**: Monitor typical payload sizes and adjust slot size
108+
2. **Overflow Handling**: Track overflow rate and auto-scale buffer size
109+
3. **Chunking**: Split large payloads across multiple slots if needed
110+
4. **Backup Writer**: Overflow data can write to secondary storage
111+
5. **Monitoring**: Extensive metrics on buffer utilization
112+
113+
## Alternatives Considered
114+
115+
1. **LMAX Disruptor Pattern**
116+
- More complex, designed for different use case
117+
- Rejected: Overkill for our simple handoff needs
118+
119+
2. **Channel/Queue Libraries**
120+
- Standard MPSC channels
121+
- Rejected: Allocation overhead, unpredictable latency
122+
123+
3. **Memory-mapped Files**
124+
- Persistent buffer with mmap
125+
- Rejected: System call overhead, page fault risks
126+
127+
4. **Direct Audit Path Calls**
128+
- Call audit path directly with async handoff
129+
- Rejected: Thread pool overhead, allocation costs
130+
131+
5. **Shared Memory with Semaphores**
132+
- Traditional IPC approach
133+
- Rejected: System call overhead for synchronization
134+
135+
## Implementation Notes
136+
137+
- Use Rust's `std::sync::atomic` with `Ordering::Relaxed` for counters
138+
- Use `Ordering::AcqRel` for state transitions
139+
- Align slots to cache line boundaries (64 bytes)
140+
- Use `#[repr(C)]` for stable memory layout
141+
- Benchmark with real LLM payloads to tune parameters
142+
143+
## Related Decisions
144+
145+
- ADR-0008: Dual-path Architecture (defines the need for this pattern)
146+
- ADR-0007: EventCore as Central Audit Mechanism (consumer of ring buffer data)
147+
- ADR-0016: Performance Monitoring (ring buffer metrics are critical)

0 commit comments

Comments
 (0)