diff --git a/docs/adr/ADR-016-two-tier-cache.md b/docs/adr/ADR-016-two-tier-cache.md new file mode 100644 index 0000000000..872637c668 --- /dev/null +++ b/docs/adr/ADR-016-two-tier-cache.md @@ -0,0 +1,304 @@ +# ADR-016: Two-Tier Cache Architecture (Memory + Redis) + +**Status:** Accepted +**Date:** 2025-11-14 +**Authors:** cassing +**Supersedes:** N/A +**Superseded by:** N/A + +## Context + +During POC development, we identified significant performance issues with the caching layer. The cache backend stores values as JSON strings, requiring serialization/deserialization for complex Pydantic models. + +**Current Situation:** + +- Cache backend accepts only string values +- Reading cached Pydantic models requires `json.loads()` followed by `model_validate()` +- This deserialization overhead occurs on EVERY cache read +- Performance profiling showed JSON deserialization as the primary bottleneck + +**Problems:** + +- `json.loads()` is expensive and became a performance bottleneck +- Typical cache read latency: ~5ms (mostly deserialization) +- High CPU overhead from repeated deserialization of the same objects +- No built-in memory layer for frequently accessed data + +**Requirements:** + +- Maintain JSON serialization for Redis (debuggable, safe, DynamoDB-compatible) +- Minimize deserialization overhead for frequently accessed data +- Support graceful degradation when Redis is unavailable +- Keep memory overhead bounded and configurable +- Maintain compatibility with existing cache interface + +## Decision + +We implement a **two-tier caching system** combining in-memory LRU cache (Tier 1) with Redis/Valkey backend (Tier 2). + +The two-tier architecture provides: + +- Fast in-memory cache for hot data (Python objects, no serialization) +- Persistent Redis cache for shared state and cold data (JSON serialization) +- Automatic cache warming (Tier 2 populates Tier 1 on miss) +- Graceful degradation to memory-only mode when Redis fails + +### Key Points + +- **Tier 1 (Memory):** In-memory LRU cache using `cachetools.TTLCache` - stores Python objects directly +- **Tier 2 (Redis):** Redis/Valkey backend with JSON serialization - shared across workers +- **99% memory hit rate** expected = ~10µs response time (100x faster than Redis) +- **Graceful degradation** - application continues with memory-only cache if Redis unavailable +- **Bounded memory** - LRU eviction + TTL prevents unbounded growth + +## Alternatives Considered + +### Alternative 1: Pickle Serialization + +Replace JSON with pickle for faster serialization/deserialization. + +**Pros:** + +- 3-5x faster than JSON for complex objects +- Native Python object serialization +- Preserves Python types without validation + +**Cons:** + +- **Security risk** - pickle can execute arbitrary code during deserialization +- **Not DynamoDB-compatible** - can't migrate to DynamoDB later +- **Not human-readable** - harder to debug cache contents +- **Version fragility** - pickle format can break between Python versions +- **Rejected:** Security and compatibility concerns outweigh performance gains + +### Alternative 2: MessagePack Serialization + +Use MessagePack instead of JSON for faster serialization. + +**Pros:** + +- Faster than JSON +- Smaller payload size +- Binary format + +**Cons:** + +- **Not DynamoDB-compatible** - DynamoDB doesn't support binary attributes well +- **Not human-readable** - harder to debug +- **Marginal improvement** - only 20-30% faster than orjson +- **Rejected:** Incompatibility with future DynamoDB migration + +### Alternative 3: Pydantic TypeAdapter + +Use Pydantic's TypeAdapter for faster validation. + +**Pros:** + +- Slightly faster than model_validate() +- Official Pydantic optimization + +**Cons:** + +- **Marginal improvement** - only 10-15% faster +- **Doesn't solve root problem** - still requires JSON deserialization +- **Rejected:** Insufficient performance improvement + +### Alternative 4: Two-Tier Cache (Selected Decision) + +In-memory LRU cache (Tier 1) + Redis backend (Tier 2). + +**Pros:** + +- **100x performance improvement** for hot data (99% memory hits) +- **Minimal memory overhead** (~1-5 MB for 1000 items) +- **Keeps JSON serialization** - safe, debuggable, DynamoDB-compatible +- **Graceful degradation** - works without Redis +- **Transparent to callers** - no API changes required +- **Battle-tested pattern** - widely used in industry + +**Cons:** + +- **Memory usage** increases slightly (~1-5 MB per process) +- **Two sources of truth** - requires careful invalidation +- **TTL configuration** - needs tuning per use case +- **Accepted tradeoffs:** Memory overhead is negligible compared to performance gains + +## Consequences + +### Positive + +- **Massive performance improvement:** 100x faster for frequently accessed data (99% memory hits = ~10µs vs 5ms) +- **Minimal memory footprint:** ~1-5 MB overhead for 1000 cached items (configurable) +- **Production resilience:** App continues working when Redis is unavailable (memory-only mode) +- **Zero API changes:** Completely transparent to existing code using cache +- **Bounded memory growth:** LRU eviction + TTL prevents memory leaks +- **Time-based expiration:** TTL ensures stale data is eventually refreshed + +### Negative + +- **Increased memory usage** per process (~1-5 MB) + - **Mitigation:** Configurable `cache_memory_max_size` allows tuning per deployment + - **Mitigation:** LRU eviction prevents unbounded growth + +- **Cache coherence complexity** with two tiers (memory + Redis) + - **Mitigation:** Explicit `delete()` invalidates both tiers atomically + - **Mitigation:** TTL bounds staleness window + - **Mitigation:** Memory cache is per-process, Redis is shared - eventual consistency acceptable + +- **Configuration overhead** - need to tune TTL per use case + - **Mitigation:** Sensible defaults (1000 items, 60s TTL) + - **Mitigation:** Clear documentation in config.py with examples + +- **Testing complexity** - two cache tiers to verify + - **Mitigation:** Added `clear_memory_cache()` method for test isolation + - **Mitigation:** Comprehensive test suite covers both tiers + +## Implementation Guidelines + +### Configuration + +```python +# config.py +cache_memory_max_size: int = Field( + default=1000, + description="In-memory cache max items (LRU eviction). Set to 0 to disable.", +) +cache_memory_ttl: int = Field( + default=60, + description="In-memory cache TTL in seconds (time-based expiration)", +) +``` + +### Two-Tier Lookup Pattern + +```python +def get_obj(self, key: str, cls: type[T]) -> T | None: + """Two-tier lookup: memory → Redis.""" + + # Tier 1: Memory cache (99% hit - FAST!) + if self._memory_cache is not None and key in self._memory_cache: + return self._memory_cache[key] + + # Tier 2: Redis cache (JSON deserialize, warm memory) + try: + value = self.get(key) + if value is None: + return None + + data = self.deserializer(value) + obj = cls.model_validate(data) + + # Warm memory cache for next access + if self._memory_cache is not None: + self._memory_cache[key] = obj + + return obj + + except (ConnectionError, TimeoutError) as e: + logger.warning(f"Cache backend unavailable: {e}") + return None # Graceful degradation +``` + +### Two-Tier Write Pattern + +```python +def set_obj(self, key: str, value: Any, ttl: int | None = None) -> None: + """Write to both tiers.""" + + # Tier 1: Memory (Python object, no serialization) + if self._memory_cache is not None: + self._memory_cache[key] = value + + # Tier 2: Redis (JSON serialization for persistence) + try: + serialized = self.serializer(value) + self.set(key, serialized, ttl) + except (ConnectionError, TimeoutError) as e: + logger.warning(f"Cache backend unavailable: {e}") +``` + +### Cache Invalidation Pattern + +```python +def delete(self, key: str) -> None: + """Delete from both tiers atomically.""" + + # Tier 1: Memory cache + if self._memory_cache is not None: + self._memory_cache.pop(key, None) + + # Tier 2: Redis backend + self._delete_from_backend(key) +``` + +### Testing Support + +```python +def clear_memory_cache(self) -> None: + """Clear Tier 1 for testing (Tier 2 unaffected).""" + if self._memory_cache is not None: + self._memory_cache.clear() +``` + +### Usage Example + +```python +# Automatic two-tier caching - transparent to callers +user = cache.get_obj("user:123", SlackUser) # Memory hit = ~10µs +if not user: + user = slack_api.get_user("123") + cache.set_obj("user:123", user, ttl=900) # Writes to both tiers +``` + +### Checklist + +- [x] Add `cache_memory_max_size` and `cache_memory_ttl` to config.py +- [x] Update `CacheBackend` base class with two-tier logic +- [x] Update `RedisCacheBackend` to accept and pass memory settings +- [x] Implement `delete()` to invalidate both tiers +- [x] Add `clear_memory_cache()` for testing +- [x] Update all existing tests to work with two-tier cache +- [x] Verify graceful degradation when Redis unavailable +- [ ] Add metrics for memory hit rate (future work) +- [ ] Add alerts for degraded mode (future work) + +## References + +- **Related ADRs:** + - [ADR-015](ADR-015-cache-update-strategy.md) - Cache Update Instead of Invalidation + - [ADR-012](ADR-012-typed-models-over-dicts.md) - Fully Typed Pydantic Models + +- **Implementation:** + - `qontract_api/cache/base.py` - CacheBackend base class with two-tier logic + - `qontract_api/cache/redis.py` - RedisCacheBackend implementation + - `qontract_api/config.py` - Memory cache configuration + - `qontract_api/main.py` - Dependency injection setup + +- **External Libraries:** + - [cachetools](https://cachetools.readthedocs.io/) - In-memory LRU/TTL cache implementation + - [orjson](https://github.com/ijl/orjson) - Fast JSON serialization for Redis tier + +--- + +## Notes + +**Performance Characteristics:** + +| Tier | Hit Rate | Latency | Overhead | +| --------------- | -------- | ------------- | -------------------- | +| Memory (Tier 1) | 99% | ~10µs | None (Python object) | +| Redis (Tier 2) | 1% | ~5ms | JSON deserialization | +| **Overall** | 100% | **~59µs avg** | **100x improvement** | + +**Memory vs Redis Tradeoffs:** + +- **Memory cache is per-process:** Each worker has its own memory cache (no sharing) +- **Redis cache is shared:** All workers share the same Redis cache (eventual consistency) +- **This is acceptable:** TTL bounds staleness window, explicit invalidation syncs both tiers + +**Future Considerations:** + +- Consider adding metrics for memory hit rate monitoring +- Consider adding alerts for prolonged degraded mode (Redis unavailable) +- Consider making TTL configurable per cache key pattern +- Consider adding cache warming strategies for predictable access patterns