app-sre · chassing · Dec 3, 2025 · hemslo · Dec 4, 2025 · chassing
diff --git a/docs/adr/ADR-016-two-tier-cache.md b/docs/adr/ADR-016-two-tier-cache.md
@@ -0,0 +1,304 @@
+# ADR-016: Two-Tier Cache Architecture (Memory + Redis)
+
+**Status:** Accepted
+**Date:** 2025-11-14
+**Authors:** cassing
+**Supersedes:** N/A
+**Superseded by:** N/A
+
+## Context
+
+During POC development, we identified significant performance issues with the caching layer. The cache backend stores values as JSON strings, requiring serialization/deserialization for complex Pydantic models.
+
+**Current Situation:**
+
+- Cache backend accepts only string values
+- Reading cached Pydantic models requires `json.loads()` followed by `model_validate()`
+- This deserialization overhead occurs on EVERY cache read
+- Performance profiling showed JSON deserialization as the primary bottleneck
+
+**Problems:**
+
+- `json.loads()` is expensive and became a performance bottleneck
+- Typical cache read latency: ~5ms (mostly deserialization)
+- High CPU overhead from repeated deserialization of the same objects
+- No built-in memory layer for frequently accessed data
+
+**Requirements:**
+
+- Maintain JSON serialization for Redis (debuggable, safe, DynamoDB-compatible)
+- Minimize deserialization overhead for frequently accessed data
+- Support graceful degradation when Redis is unavailable
+- Keep memory overhead bounded and configurable
+- Maintain compatibility with existing cache interface
+
+## Decision
+
+We implement a **two-tier caching system** combining in-memory LRU cache (Tier 1) with Redis/Valkey backend (Tier 2).
+
+The two-tier architecture provides:
+
+- Fast in-memory cache for hot data (Python objects, no serialization)
+- Persistent Redis cache for shared state and cold data (JSON serialization)
+- Automatic cache warming (Tier 2 populates Tier 1 on miss)
+- Graceful degradation to memory-only mode when Redis fails
+
+### Key Points
+
+- **Tier 1 (Memory):** In-memory LRU cache using `cachetools.TTLCache` - stores Python objects directly
+- **Tier 2 (Redis):** Redis/Valkey backend with JSON serialization - shared across workers
+- **99% memory hit rate** expected = ~10µs response time (100x faster than Redis)
+- **Graceful degradation** - application continues with memory-only cache if Redis unavailable
+- **Bounded memory** - LRU eviction + TTL prevents unbounded growth
+
+## Alternatives Considered
+
+### Alternative 1: Pickle Serialization
+
+Replace JSON with pickle for faster serialization/deserialization.
+
+**Pros:**
+
+- 3-5x faster than JSON for complex objects
+- Native Python object serialization
+- Preserves Python types without validation
+
+**Cons:**
+
+- **Security risk** - pickle can execute arbitrary code during deserialization
+- **Not DynamoDB-compatible** - can't migrate to DynamoDB later
+- **Not human-readable** - harder to debug cache contents
+- **Version fragility** - pickle format can break between Python versions
+- **Rejected:** Security and compatibility concerns outweigh performance gains
+
+### Alternative 2: MessagePack Serialization
+
+Use MessagePack instead of JSON for faster serialization.
+
+**Pros:**
+
+- Faster than JSON
+- Smaller payload size
+- Binary format
+
+**Cons:**
+
+- **Not DynamoDB-compatible** - DynamoDB doesn't support binary attributes well
+- **Not human-readable** - harder to debug
+- **Marginal improvement** - only 20-30% faster than orjson
+- **Rejected:** Incompatibility with future DynamoDB migration
+
+### Alternative 3: Pydantic TypeAdapter
+
+Use Pydantic's TypeAdapter for faster validation.
+
+**Pros:**
+
+- Slightly faster than model_validate()
+- Official Pydantic optimization
+
+**Cons:**
+
+- **Marginal improvement** - only 10-15% faster
+- **Doesn't solve root problem** - still requires JSON deserialization
+- **Rejected:** Insufficient performance improvement
+
+### Alternative 4: Two-Tier Cache (Selected Decision)
+
+In-memory LRU cache (Tier 1) + Redis backend (Tier 2).
+
+**Pros:**
+
+- **100x performance improvement** for hot data (99% memory hits)
+- **Minimal memory overhead** (~1-5 MB for 1000 items)
+- **Keeps JSON serialization** - safe, debuggable, DynamoDB-compatible
+- **Graceful degradation** - works without Redis
+- **Transparent to callers** - no API changes required
+- **Battle-tested pattern** - widely used in industry
+
+**Cons:**
+
+- **Memory usage** increases slightly (~1-5 MB per process)
+- **Two sources of truth** - requires careful invalidation
+- **TTL configuration** - needs tuning per use case
+- **Accepted tradeoffs:** Memory overhead is negligible compared to performance gains
+
+## Consequences
+
+### Positive
+
+- **Massive performance improvement:** 100x faster for frequently accessed data (99% memory hits = ~10µs vs 5ms)
+- **Minimal memory footprint:** ~1-5 MB overhead for 1000 cached items (configurable)
+- **Production resilience:** App continues working when Redis is unavailable (memory-only mode)
+- **Zero API changes:** Completely transparent to existing code using cache
+- **Bounded memory growth:** LRU eviction + TTL prevents memory leaks
+- **Time-based expiration:** TTL ensures stale data is eventually refreshed
+
+### Negative
+
+- **Increased memory usage** per process (~1-5 MB)
+  - **Mitigation:** Configurable `cache_memory_max_size` allows tuning per deployment
+  - **Mitigation:** LRU eviction prevents unbounded growth
+
+- **Cache coherence complexity** with two tiers (memory + Redis)
+  - **Mitigation:** Explicit `delete()` invalidates both tiers atomically
+  - **Mitigation:** TTL bounds staleness window
+  - **Mitigation:** Memory cache is per-process, Redis is shared - eventual consistency acceptable
+
+- **Configuration overhead** - need to tune TTL per use case
+  - **Mitigation:** Sensible defaults (1000 items, 60s TTL)
+  - **Mitigation:** Clear documentation in config.py with examples
+
+- **Testing complexity** - two cache tiers to verify
+  - **Mitigation:** Added `clear_memory_cache()` method for test isolation
+  - **Mitigation:** Comprehensive test suite covers both tiers
+
+## Implementation Guidelines
+
+### Configuration
+
+```python
+# config.py
+cache_memory_max_size: int = Field(
+    default=1000,
+    description="In-memory cache max items (LRU eviction). Set to 0 to disable.",
+)
+cache_memory_ttl: int = Field(
+    default=60,
+    description="In-memory cache TTL in seconds (time-based expiration)",
+)
+```
+
+### Two-Tier Lookup Pattern
+
+```python
+def get_obj(self, key: str, cls: type[T]) -> T | None:
+    """Two-tier lookup: memory → Redis."""
+
+    # Tier 1: Memory cache (99% hit - FAST!)
+    if self._memory_cache is not None and key in self._memory_cache:
+        return self._memory_cache[key]
+
+    # Tier 2: Redis cache (JSON deserialize, warm memory)
+    try:
+        value = self.get(key)
+        if value is None:
+            return None
+
+        data = self.deserializer(value)
+        obj = cls.model_validate(data)
+
+        # Warm memory cache for next access
+        if self._memory_cache is not None:
+            self._memory_cache[key] = obj
+
+        return obj
+
+    except (ConnectionError, TimeoutError) as e:
+        logger.warning(f"Cache backend unavailable: {e}")
+        return None  # Graceful degradation
+```
+
+### Two-Tier Write Pattern
+
+```python
+def set_obj(self, key: str, value: Any, ttl: int | None = None) -> None:
+    """Write to both tiers."""
+
+    # Tier 1: Memory (Python object, no serialization)
+    if self._memory_cache is not None:
+        self._memory_cache[key] = value
+
+    # Tier 2: Redis (JSON serialization for persistence)
+    try:
+        serialized = self.serializer(value)
+        self.set(key, serialized, ttl)
+    except (ConnectionError, TimeoutError) as e:
+        logger.warning(f"Cache backend unavailable: {e}")
+```
+
+### Cache Invalidation Pattern
+
+```python
+def delete(self, key: str) -> None:
+    """Delete from both tiers atomically."""
+
+    # Tier 1: Memory cache
+    if self._memory_cache is not None:
+        self._memory_cache.pop(key, None)
+
+    # Tier 2: Redis backend
+    self._delete_from_backend(key)
+```
+
+### Testing Support
+
+```python
+def clear_memory_cache(self) -> None:
+    """Clear Tier 1 for testing (Tier 2 unaffected)."""
+    if self._memory_cache is not None:
+        self._memory_cache.clear()
+```
+
+### Usage Example
+
+```python
+# Automatic two-tier caching - transparent to callers
+user = cache.get_obj("user:123", SlackUser)  # Memory hit = ~10µs
+if not user:
+    user = slack_api.get_user("123")
+    cache.set_obj("user:123", user, ttl=900)  # Writes to both tiers
+```
+
+### Checklist
+
+- [x] Add `cache_memory_max_size` and `cache_memory_ttl` to config.py
+- [x] Update `CacheBackend` base class with two-tier logic
+- [x] Update `RedisCacheBackend` to accept and pass memory settings
+- [x] Implement `delete()` to invalidate both tiers
+- [x] Add `clear_memory_cache()` for testing
+- [x] Update all existing tests to work with two-tier cache
+- [x] Verify graceful degradation when Redis unavailable
+- [ ] Add metrics for memory hit rate (future work)
+- [ ] Add alerts for degraded mode (future work)
+
+## References
+
+- **Related ADRs:**
+  - [ADR-015](ADR-015-cache-update-strategy.md) - Cache Update Instead of Invalidation
+  - [ADR-012](ADR-012-typed-models-over-dicts.md) - Fully Typed Pydantic Models
+
+- **Implementation:**
+  - `qontract_api/cache/base.py` - CacheBackend base class with two-tier logic
+  - `qontract_api/cache/redis.py` - RedisCacheBackend implementation
+  - `qontract_api/config.py` - Memory cache configuration
+  - `qontract_api/main.py` - Dependency injection setup
+
+- **External Libraries:**
+  - [cachetools](https://cachetools.readthedocs.io/) - In-memory LRU/TTL cache implementation
+  - [orjson](https://github.com/ijl/orjson) - Fast JSON serialization for Redis tier
+
+---
+
+## Notes
+
+**Performance Characteristics:**
+
+| Tier            | Hit Rate | Latency       | Overhead             |
+| --------------- | -------- | ------------- | -------------------- |
+| Memory (Tier 1) | 99%      | ~10µs         | None (Python object) |
+| Redis (Tier 2)  | 1%       | ~5ms          | JSON deserialization |
+| **Overall**     | 100%     | **~59µs avg** | **100x improvement** |
+
+**Memory vs Redis Tradeoffs:**
+
+- **Memory cache is per-process:** Each worker has its own memory cache (no sharing)
+- **Redis cache is shared:** All workers share the same Redis cache (eventual consistency)
+- **This is acceptable:** TTL bounds staleness window, explicit invalidation syncs both tiers
+
+**Future Considerations:**
+
+- Consider adding metrics for memory hit rate monitoring
+- Consider adding alerts for prolonged degraded mode (Redis unavailable)
+- Consider making TTL configurable per cache key pattern
+- Consider adding cache warming strategies for predictable access patterns