-
Notifications
You must be signed in to change notification settings - Fork 168
Description
π§ Epic - Multi-Layer Caching System (Memory + Redis)
Title: Multi-Layer Caching System (Memory + Redis)
Goal: Implement dual-layer caching with fast local memory (L1) and distributed Redis (L2) to eliminate both database AND Redis network latency
Why now: Even Redis calls add 1-5ms network latency. Local memory cache provides sub-millisecond access while Redis enables sharing across instances.
πββοΈ User Stories & Acceptance Criteria
Story 1 β Fast L1 Hit
As a service handler
I want frequently requested items to come from the local memory cache
So that response latency stays well below 1 ms.
β Acceptance Criteria
Scenario: L1 cache hit after warm-up
Given "tools:list:false" is already in the L1 cache
When the service calls get_or_set("tools:list:false")
Then the cache manager returns the value in < 1 ms
And the metrics counter l1_hits increments by 1
And no Redis GET operation occurs
Story 2 β L2 Redis Fallback & L1 Population
As a platform engineer
I want a miss in L1 to fall back to Redis and then fill L1
So that the next request is even faster.
β Acceptance Criteria
Scenario: L1 miss, L2 hit, L1 warmed
Given "servers:list:false" exists in Redis but not in L1
When the service calls get_or_set("servers:list:false")
Then the cache manager reads the value from Redis in < 5 ms
And writes the same key to L1 with a 45-second TTL
And returns the value to the caller
And the metrics counters l1_misses and l2_hits increment by 1
Story 3 β Cross-Instance Invalidation
As a cluster operator
I want cache invalidations in one instance to evict the same keys from every other instanceβs L1
So that no stale data is served after an update.
β Acceptance Criteria
Scenario: Broadcast invalidation pattern
Given two gateway instances A and B are running
And both have "mcp:tools:list:false" in their L1 caches
When instance A calls invalidate("mcp:tools:*")
Then instance A deletes the matching keys from its own L1
And instance A publishes "cache:invalidate" with the pattern
And instance B receives the Pub/Sub message
And instance B deletes the same keys from its L1 within 100 ms
Story 4 β Memory-Pressure Eviction
As a reliability engineer
I want the L1 cache to evict items when memory exceeds 80 % of the limit
So that the container never OOM-kills.
β Acceptance Criteria
Scenario: L1 exceeds memory threshold
Given the configured L1 memory limit is 100 MB
And current L1 usage reaches 82 MB
When the memory-pressure monitor runs
Then at least 1 000 low-priority keys are evicted
And L1 usage drops below 80 MB
And the metrics counter l1_evictions increments accordingly
Story 5 β Redis Outage Resilience
As a SRE on-call
I want the gateway to continue serving requests from L1 and fall back to the DB if Redis is unavailable
So that users experience degraded but functional service.
β Acceptance Criteria
Scenario: Redis GET timeout
Given Redis is unreachable or times out
And the requested key is not in L1
When get_or_set("federation:roots") is called
Then the cache manager logs "Redis cache error"
And fetches the data from the database
And returns the data to the caller
And populates only L1 (skips L2)
Story 6 β Metrics & Observability
As a dashboard user
I want detailed cache metrics exposed to Prometheus
So that I can monitor hit ratios and latency per layer.
β Acceptance Criteria
Scenario: Metrics endpoint exposes cache stats
When the Prometheus scraper calls /metrics
Then the response contains:
- cache_l1_hits_total
- cache_l2_hits_total
- cache_misses_total
- cache_invalidation_total
- cache_l1_memory_bytes
And each metric has labels instance=<pod_name> and version=<git_sha>
πΌοΈ Architecture
1 β Component & Data-flow
flowchart TD
subgraph "Gateway Instance"
API[Service / Handler] --> CM[MultiLayerCacheManager]
CM --> L1[(L1 Memory Cache)]
CM -- miss --> L2[(L2 Redis Cache)]
L1 -. Pub/Sub invalidations .-> CM
end
L2 -->|miss| DB[(Primary Database)]
CM -. "metrics β Prometheus" .-> MON[Monitoring]
classDef store fill:#ffd1dc,stroke:#333;
class DB,L2,L1 store
- A request hits the Service β consults CacheManager.
- L1 (per-container) is checked first; on miss the manager tries L2 Redis; final fallback is the DB.
- Redis Pub/Sub broadcasts invalidation patterns so every instance clears its own L1.
- Cache metrics flow to Prometheus/Grafana.
2 β get_or_set()
Sequence
sequenceDiagram
participant SVC as Service Code
participant CM as CacheManager
participant L1 as Memory Cache
participant L2 as Redis Cache
participant DB as Database
SVC ->> CM: get_or_set(key)
CM ->> L1: lookup(key)
alt L1 HIT
L1 -->> CM: value
CM -->> SVC: return value
else L1 MISS
CM ->> L2: GET key
alt L2 HIT
L2 -->> CM: value
CM ->> L1: put key,value
CM -->> SVC: return value
else L2 MISS
CM ->> DB: SELECT β¦
DB -->> CM: value
CM ->> L2: SETEX key,value,ttl
CM ->> L1: put key,value
CM -->> SVC: return value
end
end
- Fast path β L1 hit: ~0.1 ms.
- Typical path β L2 hit then populate L1: 2-5 ms.
- Cold path β Miss in both layers, query DB, then fill caches.
π Enhanced Design Sketch
Dual-Layer Cache Architecture:
class MultiLayerCacheManager:
"""L1 (Memory) + L2 (Redis) caching with intelligent invalidation"""
def __init__(self):
# L1: Fast local memory cache (per container)
self.l1_cache = TTLCache(maxsize=10000, ttl=60) # 10k items, 60s
self.l1_lock = asyncio.Lock()
# L2: Distributed Redis cache (shared across instances)
self.redis = Redis(url=settings.redis_url)
# Metrics tracking
self.metrics = CacheMetrics()
# Background tasks
self._start_background_tasks()
async def get_or_set(self, key: str, factory: Callable, ttl: int = 300) -> Any:
"""Multi-layer cache lookup with fallback chain"""
# π L1 CHECK: Memory cache (fastest ~0.1ms)
async with self.l1_lock:
if key in self.l1_cache:
self.metrics.l1_hits += 1
logger.debug(f"L1 cache HIT: {key}")
return self.l1_cache[key]
# π L2 CHECK: Redis cache (fast ~1-5ms)
try:
cached = await self.redis.get(key)
if cached:
self.metrics.l2_hits += 1
logger.debug(f"L2 cache HIT: {key}")
data = json.loads(cached)
# Populate L1 cache for next request
async with self.l1_lock:
self.l1_cache[key] = data
return data
except Exception as e:
logger.warning(f"Redis cache error for {key}: {e}")
# πΎ CACHE MISS: Fetch from database (~50-200ms)
self.metrics.misses += 1
logger.debug(f"Cache MISS: {key} - fetching from source")
data = await factory()
# Populate both cache layers
await self._populate_caches(key, data, ttl)
return data
async def _populate_caches(self, key: str, data: Any, ttl: int):
"""Populate both L1 and L2 caches"""
# L1: Memory cache (shorter TTL for memory efficiency)
l1_ttl = min(ttl, settings.cache_l1_max_ttl)
async with self.l1_lock:
self.l1_cache.setdefault(key, data, ttl=l1_ttl)
# L2: Redis cache (full TTL for persistence)
try:
await self.redis.setex(key, ttl, json.dumps(data, default=str))
except Exception as e:
logger.warning(f"Failed to populate Redis cache for {key}: {e}")
async def invalidate(self, pattern: str):
"""Invalidate across both cache layers"""
# L1: Clear matching keys from memory
async with self.l1_lock:
keys_to_remove = [k for k in self.l1_cache.keys() if fnmatch(k, pattern)]
for key in keys_to_remove:
del self.l1_cache[key]
logger.debug(f"L1 invalidated: {key}")
# L2: Clear from Redis and notify other instances
redis_keys = await self.redis.keys(pattern)
if redis_keys:
await self.redis.delete(*redis_keys)
# Publish invalidation event to other instances
await self.redis.publish("cache:invalidate", pattern)
logger.debug(f"L2 invalidated: {pattern} ({len(redis_keys)} keys)")
def _start_background_tasks(self):
"""Start background maintenance tasks"""
asyncio.create_task(self._listen_for_invalidations())
asyncio.create_task(self._memory_pressure_monitor())
asyncio.create_task(self._cache_stats_reporter())
Performance Characteristics:
Request Path | Latency | Use Case
ββββββββββββββββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββ
L1 Memory Hit | ~0.1ms | Frequently accessed data
L2 Redis Hit β L1 Population | ~2-5ms | Recently accessed data
Cache Miss β DB β L1+L2 Populate | ~50-200ms| First access or expired
π§ Enhanced Configuration
class CacheConfig:
# === L1 (Memory) Cache Settings ===
l1_cache_enabled: bool = True # Enable memory caching
l1_cache_size: int = 10000 # Max items in memory per instance
l1_cache_ttl: int = 60 # Default L1 TTL (seconds)
l1_max_ttl: int = 300 # Maximum L1 TTL (memory limit)
l1_memory_limit_mb: int = 100 # Max memory usage for L1 cache
# === L2 (Redis) Cache Settings ===
l2_cache_enabled: bool = True # Enable Redis caching
l2_connection_pool_size: int = 20 # Redis connection pool
l2_timeout_ms: int = 100 # Redis operation timeout
# === Entity-specific TTLs ===
tools_cache_ttl: int = 300 # 5 minutes (L2 TTL)
tools_cache_l1_ttl: int = 60 # 1 minute (L1 TTL)
servers_cache_ttl: int = 180 # 3 minutes (L2 TTL)
servers_cache_l1_ttl: int = 45 # 45 seconds (L1 TTL)
federation_cache_ttl: int = 300 # 5 minutes (L2 TTL)
federation_cache_l1_ttl: int = 60 # 1 minute (L1 TTL)
# === Cache Behavior ===
cache_on_write: bool = True # Populate on creates/updates
l1_l2_sync_enabled: bool = True # Sync L1 across instances via Redis
memory_pressure_threshold: float = 0.8 # Evict L1 at 80% memory limit
π Updated Performance Expectations
Before Caching:
GET /v1/tools (database query): 150ms avg
GET /v1/servers (with joins): 250ms avg
Federation call: 400ms avg
After L2 (Redis) Only:
GET /v1/tools (Redis hit): 5ms avg
GET /v1/servers (Redis hit): 7ms avg
Federation call (cached): 3ms avg
After L1+L2 (Memory+Redis):
GET /v1/tools (L1 hit): 0.2ms avg β‘
GET /v1/servers (L1 hit): 0.3ms avg β‘
Federation call (L1 hit): 0.1ms avg β‘
Cache Hit Distribution (after warmup):
- L1 Memory hits: ~85% (most frequent data)
- L2 Redis hits: ~12% (less frequent but cached)
- Cache misses: ~3% (new/expired data)
π§ Updated Tasks
-
Phase 1: Enhanced Cache Infrastructure
-
Implement MultiLayerCacheManager with both L1 and L2
# mcpgateway/cache/multilayer.py class MultiLayerCacheManager: def __init__(self, l1_config: L1Config, l2_config: L2Config) async def get_or_set(key, factory, l1_ttl, l2_ttl) async def invalidate_pattern(pattern) async def get_cache_stats() -> CacheStats
-
Add memory monitoring and pressure management
async def _memory_pressure_monitor(self): """Monitor memory usage and evict L1 cache when needed""" while True: memory_usage = self._get_l1_memory_usage() if memory_usage > self.config.memory_pressure_threshold: await self._evict_l1_lru_items(count=1000) await asyncio.sleep(30)
-
Implement cross-instance L1 invalidation via Redis pub/sub
async def _listen_for_invalidations(self): """Listen for invalidation events from other instances""" pubsub = self.redis.pubsub() await pubsub.subscribe("cache:invalidate") async for message in pubsub.listen(): if message["type"] == "message": pattern = message["data"].decode() await self._invalidate_l1_pattern(pattern)
-
-
Phase 2: Service Integration with Dual TTLs
-
Update all services with L1+L2 aware caching
@cached( key="mcp:tools:list:{include_inactive}", l1_ttl=60, # 1 minute in memory l2_ttl=300 # 5 minutes in Redis ) async def list_tools(self, db: Session, include_inactive: bool = False): return await self._fetch_tools_from_db(db, include_inactive)
-
Implement smart cache warming for frequently accessed data
async def warm_l1_cache(self): """Pre-populate L1 cache with hot data on startup""" hot_keys = [ "mcp:tools:list:false", "mcp:servers:list:false", "mcp:metrics:aggregated:1h" ] for key in hot_keys: await self.cache_manager.get_or_set(key, lambda: self._fetch_hot_data(key))
-
-
Phase 3: Memory Management & Optimization
-
Implement intelligent L1 eviction policies
class L1EvictionManager: def __init__(self): self.access_tracker = defaultdict(int) self.size_tracker = defaultdict(int) async def evict_candidates(self, target_count: int) -> List[str]: """Return keys to evict based on LRU + size + access frequency""" # Prioritize large, infrequently accessed items candidates = sorted( self.l1_cache.keys(), key=lambda k: ( -self.access_tracker[k], # Less frequently accessed first -self.size_tracker[k] # Larger items first ) ) return candidates[:target_count]
-
Add L1 cache size monitoring and alerts
-
Implement configurable L1 cache partitioning by entity type
-
-
Phase 4: Enhanced Monitoring
-
Comprehensive cache metrics with layer breakdown
class DetailedCacheMetrics: # L1 metrics l1_hits: int = 0 l1_misses: int = 0 l1_evictions: int = 0 l1_memory_usage_mb: float = 0 # L2 metrics l2_hits: int = 0 l2_misses: int = 0 l2_network_errors: int = 0 # Performance metrics avg_l1_response_time_ms: float = 0.2 avg_l2_response_time_ms: float = 3.5 avg_db_response_time_ms: float = 150.0 def overall_hit_ratio(self) -> float: total = self.l1_hits + self.l2_hits + self.l2_misses return (self.l1_hits + self.l2_hits) / total if total > 0 else 0.0
-
Add cache layer visualization to Admin UI
-
Implement cache performance alerting
-
π― Updated Success Criteria
- L1 cache hit ratio > 80% for frequently accessed endpoints after warmup
- Combined L1+L2 hit ratio > 95% overall
- Sub-millisecond response times for L1 cache hits
- L1 memory usage < 100MB per container instance
- Cross-instance cache invalidation < 100ms propagation time
- Cache system survives Redis outages (L1 continues working)
- Zero cache inconsistency during normal operations
π§© Additional Notes
- π Performance: L1 cache provides 100-500x faster access than Redis for hot data
- πΎ Memory Efficiency: Smart eviction prevents memory bloat while maximizing hit rates
- π High Availability: L1 cache continues working even if Redis is down
- π Scalability: Each container instance has its own L1 cache + shared L2 Redis
- π§ Operational: Can disable L1 or L2 independently for debugging
- π Monitoring: Detailed metrics for both cache layers and performance optimization
- β‘ Zero Network: L1 hits require zero network calls (Redis or database)