-
Notifications
You must be signed in to change notification settings - Fork 99
📝 ADR-016: Two-Tier Cache Architecture #5332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
chassing
wants to merge
1
commit into
app-sre:master
Choose a base branch
from
chassing:APPSRE-12896/qontract-api-adr-16
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,304 @@ | ||
| # ADR-016: Two-Tier Cache Architecture (Memory + Redis) | ||
|
|
||
| **Status:** Accepted | ||
| **Date:** 2025-11-14 | ||
| **Authors:** cassing | ||
| **Supersedes:** N/A | ||
| **Superseded by:** N/A | ||
|
|
||
| ## Context | ||
|
|
||
| During POC development, we identified significant performance issues with the caching layer. The cache backend stores values as JSON strings, requiring serialization/deserialization for complex Pydantic models. | ||
|
|
||
| **Current Situation:** | ||
|
|
||
| - Cache backend accepts only string values | ||
| - Reading cached Pydantic models requires `json.loads()` followed by `model_validate()` | ||
| - This deserialization overhead occurs on EVERY cache read | ||
| - Performance profiling showed JSON deserialization as the primary bottleneck | ||
|
|
||
| **Problems:** | ||
|
|
||
| - `json.loads()` is expensive and became a performance bottleneck | ||
| - Typical cache read latency: ~5ms (mostly deserialization) | ||
| - High CPU overhead from repeated deserialization of the same objects | ||
| - No built-in memory layer for frequently accessed data | ||
|
|
||
| **Requirements:** | ||
|
|
||
| - Maintain JSON serialization for Redis (debuggable, safe, DynamoDB-compatible) | ||
| - Minimize deserialization overhead for frequently accessed data | ||
| - Support graceful degradation when Redis is unavailable | ||
| - Keep memory overhead bounded and configurable | ||
| - Maintain compatibility with existing cache interface | ||
|
|
||
| ## Decision | ||
|
|
||
| We implement a **two-tier caching system** combining in-memory LRU cache (Tier 1) with Redis/Valkey backend (Tier 2). | ||
|
|
||
| The two-tier architecture provides: | ||
|
|
||
| - Fast in-memory cache for hot data (Python objects, no serialization) | ||
| - Persistent Redis cache for shared state and cold data (JSON serialization) | ||
| - Automatic cache warming (Tier 2 populates Tier 1 on miss) | ||
| - Graceful degradation to memory-only mode when Redis fails | ||
|
|
||
| ### Key Points | ||
|
|
||
| - **Tier 1 (Memory):** In-memory LRU cache using `cachetools.TTLCache` - stores Python objects directly | ||
| - **Tier 2 (Redis):** Redis/Valkey backend with JSON serialization - shared across workers | ||
| - **99% memory hit rate** expected = ~10µs response time (100x faster than Redis) | ||
| - **Graceful degradation** - application continues with memory-only cache if Redis unavailable | ||
| - **Bounded memory** - LRU eviction + TTL prevents unbounded growth | ||
|
|
||
| ## Alternatives Considered | ||
|
|
||
| ### Alternative 1: Pickle Serialization | ||
|
|
||
| Replace JSON with pickle for faster serialization/deserialization. | ||
|
|
||
| **Pros:** | ||
|
|
||
| - 3-5x faster than JSON for complex objects | ||
| - Native Python object serialization | ||
| - Preserves Python types without validation | ||
|
|
||
| **Cons:** | ||
|
|
||
| - **Security risk** - pickle can execute arbitrary code during deserialization | ||
| - **Not DynamoDB-compatible** - can't migrate to DynamoDB later | ||
| - **Not human-readable** - harder to debug cache contents | ||
| - **Version fragility** - pickle format can break between Python versions | ||
| - **Rejected:** Security and compatibility concerns outweigh performance gains | ||
|
|
||
| ### Alternative 2: MessagePack Serialization | ||
|
|
||
| Use MessagePack instead of JSON for faster serialization. | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Faster than JSON | ||
| - Smaller payload size | ||
| - Binary format | ||
|
|
||
| **Cons:** | ||
|
|
||
| - **Not DynamoDB-compatible** - DynamoDB doesn't support binary attributes well | ||
| - **Not human-readable** - harder to debug | ||
| - **Marginal improvement** - only 20-30% faster than orjson | ||
| - **Rejected:** Incompatibility with future DynamoDB migration | ||
|
|
||
| ### Alternative 3: Pydantic TypeAdapter | ||
|
|
||
| Use Pydantic's TypeAdapter for faster validation. | ||
|
|
||
| **Pros:** | ||
|
|
||
| - Slightly faster than model_validate() | ||
| - Official Pydantic optimization | ||
|
|
||
| **Cons:** | ||
|
|
||
| - **Marginal improvement** - only 10-15% faster | ||
| - **Doesn't solve root problem** - still requires JSON deserialization | ||
| - **Rejected:** Insufficient performance improvement | ||
|
|
||
| ### Alternative 4: Two-Tier Cache (Selected Decision) | ||
|
|
||
| In-memory LRU cache (Tier 1) + Redis backend (Tier 2). | ||
|
|
||
| **Pros:** | ||
|
|
||
| - **100x performance improvement** for hot data (99% memory hits) | ||
| - **Minimal memory overhead** (~1-5 MB for 1000 items) | ||
| - **Keeps JSON serialization** - safe, debuggable, DynamoDB-compatible | ||
| - **Graceful degradation** - works without Redis | ||
| - **Transparent to callers** - no API changes required | ||
| - **Battle-tested pattern** - widely used in industry | ||
|
|
||
| **Cons:** | ||
|
|
||
| - **Memory usage** increases slightly (~1-5 MB per process) | ||
| - **Two sources of truth** - requires careful invalidation | ||
| - **TTL configuration** - needs tuning per use case | ||
| - **Accepted tradeoffs:** Memory overhead is negligible compared to performance gains | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
|
|
||
| - **Massive performance improvement:** 100x faster for frequently accessed data (99% memory hits = ~10µs vs 5ms) | ||
| - **Minimal memory footprint:** ~1-5 MB overhead for 1000 cached items (configurable) | ||
| - **Production resilience:** App continues working when Redis is unavailable (memory-only mode) | ||
| - **Zero API changes:** Completely transparent to existing code using cache | ||
| - **Bounded memory growth:** LRU eviction + TTL prevents memory leaks | ||
| - **Time-based expiration:** TTL ensures stale data is eventually refreshed | ||
|
|
||
| ### Negative | ||
|
|
||
| - **Increased memory usage** per process (~1-5 MB) | ||
| - **Mitigation:** Configurable `cache_memory_max_size` allows tuning per deployment | ||
| - **Mitigation:** LRU eviction prevents unbounded growth | ||
|
|
||
| - **Cache coherence complexity** with two tiers (memory + Redis) | ||
| - **Mitigation:** Explicit `delete()` invalidates both tiers atomically | ||
| - **Mitigation:** TTL bounds staleness window | ||
| - **Mitigation:** Memory cache is per-process, Redis is shared - eventual consistency acceptable | ||
|
|
||
| - **Configuration overhead** - need to tune TTL per use case | ||
| - **Mitigation:** Sensible defaults (1000 items, 60s TTL) | ||
| - **Mitigation:** Clear documentation in config.py with examples | ||
|
|
||
| - **Testing complexity** - two cache tiers to verify | ||
| - **Mitigation:** Added `clear_memory_cache()` method for test isolation | ||
| - **Mitigation:** Comprehensive test suite covers both tiers | ||
|
|
||
| ## Implementation Guidelines | ||
|
|
||
| ### Configuration | ||
|
|
||
| ```python | ||
| # config.py | ||
| cache_memory_max_size: int = Field( | ||
| default=1000, | ||
| description="In-memory cache max items (LRU eviction). Set to 0 to disable.", | ||
| ) | ||
| cache_memory_ttl: int = Field( | ||
| default=60, | ||
| description="In-memory cache TTL in seconds (time-based expiration)", | ||
| ) | ||
| ``` | ||
|
|
||
| ### Two-Tier Lookup Pattern | ||
|
|
||
| ```python | ||
| def get_obj(self, key: str, cls: type[T]) -> T | None: | ||
| """Two-tier lookup: memory → Redis.""" | ||
|
|
||
| # Tier 1: Memory cache (99% hit - FAST!) | ||
| if self._memory_cache is not None and key in self._memory_cache: | ||
| return self._memory_cache[key] | ||
|
|
||
| # Tier 2: Redis cache (JSON deserialize, warm memory) | ||
| try: | ||
| value = self.get(key) | ||
| if value is None: | ||
| return None | ||
|
|
||
| data = self.deserializer(value) | ||
| obj = cls.model_validate(data) | ||
|
|
||
| # Warm memory cache for next access | ||
| if self._memory_cache is not None: | ||
| self._memory_cache[key] = obj | ||
|
|
||
| return obj | ||
|
|
||
| except (ConnectionError, TimeoutError) as e: | ||
| logger.warning(f"Cache backend unavailable: {e}") | ||
| return None # Graceful degradation | ||
| ``` | ||
|
|
||
| ### Two-Tier Write Pattern | ||
|
|
||
| ```python | ||
| def set_obj(self, key: str, value: Any, ttl: int | None = None) -> None: | ||
| """Write to both tiers.""" | ||
|
|
||
| # Tier 1: Memory (Python object, no serialization) | ||
| if self._memory_cache is not None: | ||
| self._memory_cache[key] = value | ||
|
|
||
| # Tier 2: Redis (JSON serialization for persistence) | ||
| try: | ||
| serialized = self.serializer(value) | ||
| self.set(key, serialized, ttl) | ||
| except (ConnectionError, TimeoutError) as e: | ||
| logger.warning(f"Cache backend unavailable: {e}") | ||
| ``` | ||
|
|
||
| ### Cache Invalidation Pattern | ||
|
|
||
| ```python | ||
| def delete(self, key: str) -> None: | ||
| """Delete from both tiers atomically.""" | ||
|
|
||
| # Tier 1: Memory cache | ||
| if self._memory_cache is not None: | ||
| self._memory_cache.pop(key, None) | ||
|
|
||
| # Tier 2: Redis backend | ||
| self._delete_from_backend(key) | ||
| ``` | ||
|
|
||
| ### Testing Support | ||
|
|
||
| ```python | ||
| def clear_memory_cache(self) -> None: | ||
| """Clear Tier 1 for testing (Tier 2 unaffected).""" | ||
| if self._memory_cache is not None: | ||
| self._memory_cache.clear() | ||
| ``` | ||
|
|
||
| ### Usage Example | ||
|
|
||
| ```python | ||
| # Automatic two-tier caching - transparent to callers | ||
| user = cache.get_obj("user:123", SlackUser) # Memory hit = ~10µs | ||
| if not user: | ||
| user = slack_api.get_user("123") | ||
| cache.set_obj("user:123", user, ttl=900) # Writes to both tiers | ||
| ``` | ||
|
|
||
| ### Checklist | ||
|
|
||
| - [x] Add `cache_memory_max_size` and `cache_memory_ttl` to config.py | ||
| - [x] Update `CacheBackend` base class with two-tier logic | ||
| - [x] Update `RedisCacheBackend` to accept and pass memory settings | ||
| - [x] Implement `delete()` to invalidate both tiers | ||
| - [x] Add `clear_memory_cache()` for testing | ||
| - [x] Update all existing tests to work with two-tier cache | ||
| - [x] Verify graceful degradation when Redis unavailable | ||
| - [ ] Add metrics for memory hit rate (future work) | ||
| - [ ] Add alerts for degraded mode (future work) | ||
|
|
||
| ## References | ||
|
|
||
| - **Related ADRs:** | ||
| - [ADR-015](ADR-015-cache-update-strategy.md) - Cache Update Instead of Invalidation | ||
| - [ADR-012](ADR-012-typed-models-over-dicts.md) - Fully Typed Pydantic Models | ||
|
|
||
| - **Implementation:** | ||
| - `qontract_api/cache/base.py` - CacheBackend base class with two-tier logic | ||
| - `qontract_api/cache/redis.py` - RedisCacheBackend implementation | ||
| - `qontract_api/config.py` - Memory cache configuration | ||
| - `qontract_api/main.py` - Dependency injection setup | ||
|
|
||
| - **External Libraries:** | ||
| - [cachetools](https://cachetools.readthedocs.io/) - In-memory LRU/TTL cache implementation | ||
| - [orjson](https://github.com/ijl/orjson) - Fast JSON serialization for Redis tier | ||
|
|
||
| --- | ||
|
|
||
| ## Notes | ||
|
|
||
| **Performance Characteristics:** | ||
|
|
||
| | Tier | Hit Rate | Latency | Overhead | | ||
| | --------------- | -------- | ------------- | -------------------- | | ||
| | Memory (Tier 1) | 99% | ~10µs | None (Python object) | | ||
| | Redis (Tier 2) | 1% | ~5ms | JSON deserialization | | ||
| | **Overall** | 100% | **~59µs avg** | **100x improvement** | | ||
|
|
||
| **Memory vs Redis Tradeoffs:** | ||
|
|
||
| - **Memory cache is per-process:** Each worker has its own memory cache (no sharing) | ||
| - **Redis cache is shared:** All workers share the same Redis cache (eventual consistency) | ||
| - **This is acceptable:** TTL bounds staleness window, explicit invalidation syncs both tiers | ||
|
|
||
| **Future Considerations:** | ||
|
|
||
| - Consider adding metrics for memory hit rate monitoring | ||
| - Consider adding alerts for prolonged degraded mode (Redis unavailable) | ||
| - Consider making TTL configurable per cache key pattern | ||
| - Consider adding cache warming strategies for predictable access patterns | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy from #5304 (comment)
This will only valid for immutable cache, if any remote cache key not change, but value changed, we will have stale data.
If the goal is to avoid deserialization and large string response, we can add etag to cache value. Either use 2 cache items key & key:etag, or use redis hash to store both etag and content in one item.
The get flow will be:
The set flow will be:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the memory cache has a very short TTL (max 5 seconds), it's just to avoid any Redis traffic and deserialization in a short time period. Eg. within one reconciliation call. For example,
SlackAPI.get_usersis called 3 times within a few seconds, and a simple memory cache speeds everything up.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then your use case is cache in the same lifecycle of request / task processing, that's a standard feature in ORM / web frameworks to optimize perf to just use point in time data during one request, we can definitely add it, just not use TTL but tie to execution context.
Even for a short TTL, stale data can lead to unexpected result for concurrent requests handled by different servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "but tie to execution context"? I can't control the usage side of the cache. For sure, the memory cache supports the lifecycle of a request/task/process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, memory cache start and ends with request/task, when we inject cache service, should make memory cache as instance level so it can auto cleanup after request/task done. Same as https://docs.python.org/3/library/functools.html#functools.cached_property or instance level
cache/lru_cache.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't know. The current cache backend implementation is a singleton; this also hardly depends on how FastAPI handles the requests. asyncio vs process vs threads vs uvicorn workers ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can control it in our own code
Cacheis the only class that used by application. It includes all methods for model load & dump, as ORM layer with cached method. It is created as simple instance var.CacheBackendonly act as adaptor to different cache backend to handle raw response, and it can be singleton.By using
functools.cache, it's thread safe and we can also easily export metrics likeself.get.cache_info()https://docs.python.org/3/library/functools.html#functools.lru_cache