Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
304 changes: 304 additions & 0 deletions docs/adr/ADR-016-two-tier-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
# ADR-016: Two-Tier Cache Architecture (Memory + Redis)

**Status:** Accepted
**Date:** 2025-11-14
**Authors:** cassing
**Supersedes:** N/A
**Superseded by:** N/A

## Context

During POC development, we identified significant performance issues with the caching layer. The cache backend stores values as JSON strings, requiring serialization/deserialization for complex Pydantic models.

**Current Situation:**

- Cache backend accepts only string values
- Reading cached Pydantic models requires `json.loads()` followed by `model_validate()`
- This deserialization overhead occurs on EVERY cache read
- Performance profiling showed JSON deserialization as the primary bottleneck

**Problems:**

- `json.loads()` is expensive and became a performance bottleneck
- Typical cache read latency: ~5ms (mostly deserialization)
- High CPU overhead from repeated deserialization of the same objects
- No built-in memory layer for frequently accessed data

**Requirements:**

- Maintain JSON serialization for Redis (debuggable, safe, DynamoDB-compatible)
- Minimize deserialization overhead for frequently accessed data
- Support graceful degradation when Redis is unavailable
- Keep memory overhead bounded and configurable
- Maintain compatibility with existing cache interface

## Decision

We implement a **two-tier caching system** combining in-memory LRU cache (Tier 1) with Redis/Valkey backend (Tier 2).

The two-tier architecture provides:

- Fast in-memory cache for hot data (Python objects, no serialization)
- Persistent Redis cache for shared state and cold data (JSON serialization)
- Automatic cache warming (Tier 2 populates Tier 1 on miss)
- Graceful degradation to memory-only mode when Redis fails

### Key Points

- **Tier 1 (Memory):** In-memory LRU cache using `cachetools.TTLCache` - stores Python objects directly
- **Tier 2 (Redis):** Redis/Valkey backend with JSON serialization - shared across workers
- **99% memory hit rate** expected = ~10µs response time (100x faster than Redis)
- **Graceful degradation** - application continues with memory-only cache if Redis unavailable
- **Bounded memory** - LRU eviction + TTL prevents unbounded growth

## Alternatives Considered

### Alternative 1: Pickle Serialization

Replace JSON with pickle for faster serialization/deserialization.

**Pros:**

- 3-5x faster than JSON for complex objects
- Native Python object serialization
- Preserves Python types without validation

**Cons:**

- **Security risk** - pickle can execute arbitrary code during deserialization
- **Not DynamoDB-compatible** - can't migrate to DynamoDB later
- **Not human-readable** - harder to debug cache contents
- **Version fragility** - pickle format can break between Python versions
- **Rejected:** Security and compatibility concerns outweigh performance gains

### Alternative 2: MessagePack Serialization

Use MessagePack instead of JSON for faster serialization.

**Pros:**

- Faster than JSON
- Smaller payload size
- Binary format

**Cons:**

- **Not DynamoDB-compatible** - DynamoDB doesn't support binary attributes well
- **Not human-readable** - harder to debug
- **Marginal improvement** - only 20-30% faster than orjson
- **Rejected:** Incompatibility with future DynamoDB migration

### Alternative 3: Pydantic TypeAdapter

Use Pydantic's TypeAdapter for faster validation.

**Pros:**

- Slightly faster than model_validate()
- Official Pydantic optimization

**Cons:**

- **Marginal improvement** - only 10-15% faster
- **Doesn't solve root problem** - still requires JSON deserialization
- **Rejected:** Insufficient performance improvement

### Alternative 4: Two-Tier Cache (Selected Decision)

In-memory LRU cache (Tier 1) + Redis backend (Tier 2).

**Pros:**

- **100x performance improvement** for hot data (99% memory hits)
- **Minimal memory overhead** (~1-5 MB for 1000 items)
- **Keeps JSON serialization** - safe, debuggable, DynamoDB-compatible
- **Graceful degradation** - works without Redis
- **Transparent to callers** - no API changes required
- **Battle-tested pattern** - widely used in industry

**Cons:**

- **Memory usage** increases slightly (~1-5 MB per process)
- **Two sources of truth** - requires careful invalidation
- **TTL configuration** - needs tuning per use case
- **Accepted tradeoffs:** Memory overhead is negligible compared to performance gains

## Consequences

### Positive

- **Massive performance improvement:** 100x faster for frequently accessed data (99% memory hits = ~10µs vs 5ms)
- **Minimal memory footprint:** ~1-5 MB overhead for 1000 cached items (configurable)
- **Production resilience:** App continues working when Redis is unavailable (memory-only mode)
- **Zero API changes:** Completely transparent to existing code using cache
- **Bounded memory growth:** LRU eviction + TTL prevents memory leaks
- **Time-based expiration:** TTL ensures stale data is eventually refreshed

### Negative

- **Increased memory usage** per process (~1-5 MB)
- **Mitigation:** Configurable `cache_memory_max_size` allows tuning per deployment
- **Mitigation:** LRU eviction prevents unbounded growth

- **Cache coherence complexity** with two tiers (memory + Redis)
- **Mitigation:** Explicit `delete()` invalidates both tiers atomically
- **Mitigation:** TTL bounds staleness window
- **Mitigation:** Memory cache is per-process, Redis is shared - eventual consistency acceptable

- **Configuration overhead** - need to tune TTL per use case
- **Mitigation:** Sensible defaults (1000 items, 60s TTL)
- **Mitigation:** Clear documentation in config.py with examples

- **Testing complexity** - two cache tiers to verify
- **Mitigation:** Added `clear_memory_cache()` method for test isolation
- **Mitigation:** Comprehensive test suite covers both tiers

## Implementation Guidelines

### Configuration

```python
# config.py
cache_memory_max_size: int = Field(
default=1000,
description="In-memory cache max items (LRU eviction). Set to 0 to disable.",
)
cache_memory_ttl: int = Field(
default=60,
description="In-memory cache TTL in seconds (time-based expiration)",
)
```

### Two-Tier Lookup Pattern

```python
def get_obj(self, key: str, cls: type[T]) -> T | None:
"""Two-tier lookup: memory → Redis."""

# Tier 1: Memory cache (99% hit - FAST!)
if self._memory_cache is not None and key in self._memory_cache:
return self._memory_cache[key]
Comment on lines +178 to +180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy from #5304 (comment)

This will only valid for immutable cache, if any remote cache key not change, but value changed, we will have stale data.

If the goal is to avoid deserialization and large string response, we can add etag to cache value. Either use 2 cache items key & key:etag, or use redis hash to store both etag and content in one item.

The get flow will be:

  • get etag from remote cache
  • get content from local cache by etag
  • fallback to get remote cache content
  • sync local cache if missed

The set flow will be:

  • calculate etag of content
  • set both etag and content to local & remote

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the memory cache has a very short TTL (max 5 seconds), it's just to avoid any Redis traffic and deserialization in a short time period. Eg. within one reconciliation call. For example, SlackAPI.get_users is called 3 times within a few seconds, and a simple memory cache speeds everything up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then your use case is cache in the same lifecycle of request / task processing, that's a standard feature in ORM / web frameworks to optimize perf to just use point in time data during one request, we can definitely add it, just not use TTL but tie to execution context.

Even for a short TTL, stale data can lead to unexpected result for concurrent requests handled by different servers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "but tie to execution context"? I can't control the usage side of the cache. For sure, the memory cache supports the lifecycle of a request/task/process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, memory cache start and ends with request/task, when we inject cache service, should make memory cache as instance level so it can auto cleanup after request/task done. Same as https://docs.python.org/3/library/functools.html#functools.cached_property or instance level cache / lru_cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't know. The current cache backend implementation is a singleton; this also hardly depends on how FastAPI handles the requests. asyncio vs process vs threads vs uvicorn workers ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can control it in our own code

class Cache:
    def __init__(self, backend: CacheBackend):
        self.backend = backend
        self.get = functools.cache(self.backend.get)
        self.get_obj = functools.cache(self._get_obj)

    def _get_obj(self, key: str, cls: type[T]) -> T | None:
        ...

Cache is the only class that used by application. It includes all methods for model load & dump, as ORM layer with cached method. It is created as simple instance var.

CacheBackend only act as adaptor to different cache backend to handle raw response, and it can be singleton.

By using functools.cache, it's thread safe and we can also easily export metrics like self.get.cache_info() https://docs.python.org/3/library/functools.html#functools.lru_cache


# Tier 2: Redis cache (JSON deserialize, warm memory)
try:
value = self.get(key)
if value is None:
return None

data = self.deserializer(value)
obj = cls.model_validate(data)

# Warm memory cache for next access
if self._memory_cache is not None:
self._memory_cache[key] = obj

return obj

except (ConnectionError, TimeoutError) as e:
logger.warning(f"Cache backend unavailable: {e}")
return None # Graceful degradation
```

### Two-Tier Write Pattern

```python
def set_obj(self, key: str, value: Any, ttl: int | None = None) -> None:
"""Write to both tiers."""

# Tier 1: Memory (Python object, no serialization)
if self._memory_cache is not None:
self._memory_cache[key] = value

# Tier 2: Redis (JSON serialization for persistence)
try:
serialized = self.serializer(value)
self.set(key, serialized, ttl)
except (ConnectionError, TimeoutError) as e:
logger.warning(f"Cache backend unavailable: {e}")
```

### Cache Invalidation Pattern

```python
def delete(self, key: str) -> None:
"""Delete from both tiers atomically."""

# Tier 1: Memory cache
if self._memory_cache is not None:
self._memory_cache.pop(key, None)

# Tier 2: Redis backend
self._delete_from_backend(key)
```

### Testing Support

```python
def clear_memory_cache(self) -> None:
"""Clear Tier 1 for testing (Tier 2 unaffected)."""
if self._memory_cache is not None:
self._memory_cache.clear()
```

### Usage Example

```python
# Automatic two-tier caching - transparent to callers
user = cache.get_obj("user:123", SlackUser) # Memory hit = ~10µs
if not user:
user = slack_api.get_user("123")
cache.set_obj("user:123", user, ttl=900) # Writes to both tiers
```

### Checklist

- [x] Add `cache_memory_max_size` and `cache_memory_ttl` to config.py
- [x] Update `CacheBackend` base class with two-tier logic
- [x] Update `RedisCacheBackend` to accept and pass memory settings
- [x] Implement `delete()` to invalidate both tiers
- [x] Add `clear_memory_cache()` for testing
- [x] Update all existing tests to work with two-tier cache
- [x] Verify graceful degradation when Redis unavailable
- [ ] Add metrics for memory hit rate (future work)
- [ ] Add alerts for degraded mode (future work)

## References

- **Related ADRs:**
- [ADR-015](ADR-015-cache-update-strategy.md) - Cache Update Instead of Invalidation
- [ADR-012](ADR-012-typed-models-over-dicts.md) - Fully Typed Pydantic Models

- **Implementation:**
- `qontract_api/cache/base.py` - CacheBackend base class with two-tier logic
- `qontract_api/cache/redis.py` - RedisCacheBackend implementation
- `qontract_api/config.py` - Memory cache configuration
- `qontract_api/main.py` - Dependency injection setup

- **External Libraries:**
- [cachetools](https://cachetools.readthedocs.io/) - In-memory LRU/TTL cache implementation
- [orjson](https://github.com/ijl/orjson) - Fast JSON serialization for Redis tier

---

## Notes

**Performance Characteristics:**

| Tier | Hit Rate | Latency | Overhead |
| --------------- | -------- | ------------- | -------------------- |
| Memory (Tier 1) | 99% | ~10µs | None (Python object) |
| Redis (Tier 2) | 1% | ~5ms | JSON deserialization |
| **Overall** | 100% | **~59µs avg** | **100x improvement** |

**Memory vs Redis Tradeoffs:**

- **Memory cache is per-process:** Each worker has its own memory cache (no sharing)
- **Redis cache is shared:** All workers share the same Redis cache (eventual consistency)
- **This is acceptable:** TTL bounds staleness window, explicit invalidation syncs both tiers

**Future Considerations:**

- Consider adding metrics for memory hit rate monitoring
- Consider adding alerts for prolonged degraded mode (Redis unavailable)
- Consider making TTL configurable per cache key pattern
- Consider adding cache warming strategies for predictable access patterns