[Feature Request]: Multi-Layer Caching System (Memory + Redis)

### 🧭 Epic - Multi-Layer Caching System (Memory + Redis)

**Title:** Multi-Layer Caching System (Memory + Redis)
**Goal:** Implement **dual-layer caching** with fast local memory (L1) and distributed Redis (L2) to eliminate both database AND Redis network latency
**Why now:** Even Redis calls add 1-5ms network latency. Local memory cache provides sub-millisecond access while Redis enables sharing across instances.

---

### 🙋‍♂️ User Stories & Acceptance Criteria

#### **Story 1 — Fast L1 Hit**

**As a** service handler
**I want** frequently requested items to come from the local memory cache
**So that** response latency stays well below 1 ms.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: L1 cache hit after warm-up
  Given "tools:list:false" is already in the L1 cache
  When the service calls get_or_set("tools:list:false")
  Then the cache manager returns the value in < 1 ms
  And the metrics counter l1_hits increments by 1
  And no Redis GET operation occurs
```

</details>

---

#### **Story 2 — L2 Redis Fallback & L1 Population**

**As a** platform engineer
**I want** a miss in L1 to fall back to Redis and then fill L1
**So that** the next request is even faster.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: L1 miss, L2 hit, L1 warmed
  Given "servers:list:false" exists in Redis but not in L1
  When the service calls get_or_set("servers:list:false")
  Then the cache manager reads the value from Redis in < 5 ms
  And writes the same key to L1 with a 45-second TTL
  And returns the value to the caller
  And the metrics counters l1_misses and l2_hits increment by 1
```

</details>

---

#### **Story 3 — Cross-Instance Invalidation**

**As a** cluster operator
**I want** cache invalidations in one instance to evict the same keys from every other instance’s L1
**So that** no stale data is served after an update.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: Broadcast invalidation pattern
  Given two gateway instances A and B are running
  And both have "mcp:tools:list:false" in their L1 caches
  When instance A calls invalidate("mcp:tools:*")
  Then instance A deletes the matching keys from its own L1
  And instance A publishes "cache:invalidate" with the pattern
  And instance B receives the Pub/Sub message
  And instance B deletes the same keys from its L1 within 100 ms
```

</details>

---

#### **Story 4 — Memory-Pressure Eviction**

**As a** reliability engineer
**I want** the L1 cache to evict items when memory exceeds 80 % of the limit
**So that** the container never OOM-kills.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: L1 exceeds memory threshold
  Given the configured L1 memory limit is 100 MB
  And current L1 usage reaches 82 MB
  When the memory-pressure monitor runs
  Then at least 1 000 low-priority keys are evicted
  And L1 usage drops below 80 MB
  And the metrics counter l1_evictions increments accordingly
```

</details>

---

#### **Story 5 — Redis Outage Resilience**

**As a** SRE on-call
**I want** the gateway to continue serving requests from L1 and fall back to the DB if Redis is unavailable
**So that** users experience degraded *but functional* service.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: Redis GET timeout
  Given Redis is unreachable or times out
  And the requested key is not in L1
  When get_or_set("federation:roots") is called
  Then the cache manager logs "Redis cache error"
  And fetches the data from the database
  And returns the data to the caller
  And populates only L1 (skips L2)
```

</details>

---

#### **Story 6 — Metrics & Observability**

**As a** dashboard user
**I want** detailed cache metrics exposed to Prometheus
**So that** I can monitor hit ratios and latency per layer.

<details>
<summary>✅ Acceptance Criteria</summary>

```gherkin
Scenario: Metrics endpoint exposes cache stats
  When the Prometheus scraper calls /metrics
  Then the response contains:
    - cache_l1_hits_total
    - cache_l2_hits_total
    - cache_misses_total
    - cache_invalidation_total
    - cache_l1_memory_bytes
  And each metric has labels instance=<pod_name> and version=<git_sha>
```

</details>

---

### 🖼️ **Architecture**

#### 1 — Component & Data-flow

```mermaid
flowchart TD
    subgraph "Gateway Instance"
        API[Service / Handler] --> CM[MultiLayerCacheManager]
        CM --> L1[(L1 Memory Cache)]
        CM -- miss --> L2[(L2 Redis Cache)]
        L1 -. Pub/Sub invalidations .-> CM
    end

    L2 -->|miss| DB[(Primary Database)]
    CM -. "metrics ➜ Prometheus" .-> MON[Monitoring]

    classDef store fill:#ffd1dc,stroke:#333;
    class DB,L2,L1 store
```

* A request hits the **Service** → consults **CacheManager**.
* **L1 (per-container)** is checked first; on miss the manager tries **L2 Redis**; final fallback is the **DB**.
* Redis *Pub/Sub* broadcasts invalidation patterns so every instance clears its own L1.
* Cache metrics flow to Prometheus/Grafana.

---

#### 2 — `get_or_set()` Sequence

```mermaid
sequenceDiagram
    participant SVC  as Service Code
    participant CM   as CacheManager
    participant L1   as Memory Cache
    participant L2   as Redis Cache
    participant DB   as Database

    SVC ->> CM: get_or_set(key)
    CM  ->> L1: lookup(key)
    alt L1 HIT
        L1 -->> CM: value
        CM -->> SVC: return value
    else L1 MISS
        CM ->> L2: GET key
        alt L2 HIT
            L2 -->> CM: value
            CM ->> L1: put key,value
            CM -->> SVC: return value
        else L2 MISS
            CM ->> DB: SELECT …
            DB -->> CM: value
            CM ->> L2: SETEX key,value,ttl
            CM ->> L1: put key,value
            CM -->> SVC: return value
        end
    end
```

* **Fast path** — L1 hit: \~0.1 ms.
* **Typical path** — L2 hit then populate L1: 2-5 ms.
* **Cold path** — Miss in both layers, query DB, then fill caches.


---

### 📐 Enhanced Design Sketch

**Dual-Layer Cache Architecture:**
```python
class MultiLayerCacheManager:
    """L1 (Memory) + L2 (Redis) caching with intelligent invalidation"""
    
    def __init__(self):
        # L1: Fast local memory cache (per container)
        self.l1_cache = TTLCache(maxsize=10000, ttl=60)  # 10k items, 60s
        self.l1_lock = asyncio.Lock()
        
        # L2: Distributed Redis cache (shared across instances) 
        self.redis = Redis(url=settings.redis_url)
        
        # Metrics tracking
        self.metrics = CacheMetrics()
        
        # Background tasks
        self._start_background_tasks()
    
    async def get_or_set(self, key: str, factory: Callable, ttl: int = 300) -> Any:
        """Multi-layer cache lookup with fallback chain"""
        
        # 🚀 L1 CHECK: Memory cache (fastest ~0.1ms)
        async with self.l1_lock:
            if key in self.l1_cache:
                self.metrics.l1_hits += 1
                logger.debug(f"L1 cache HIT: {key}")
                return self.l1_cache[key]
        
        # 🔄 L2 CHECK: Redis cache (fast ~1-5ms)
        try:
            cached = await self.redis.get(key)
            if cached:
                self.metrics.l2_hits += 1
                logger.debug(f"L2 cache HIT: {key}")
                data = json.loads(cached)
                
                # Populate L1 cache for next request
                async with self.l1_lock:
                    self.l1_cache[key] = data
                    
                return data
        except Exception as e:
            logger.warning(f"Redis cache error for {key}: {e}")
        
        # 💾 CACHE MISS: Fetch from database (~50-200ms)
        self.metrics.misses += 1
        logger.debug(f"Cache MISS: {key} - fetching from source")
        
        data = await factory()
        
        # Populate both cache layers
        await self._populate_caches(key, data, ttl)
        
        return data
    
    async def _populate_caches(self, key: str, data: Any, ttl: int):
        """Populate both L1 and L2 caches"""
        # L1: Memory cache (shorter TTL for memory efficiency)
        l1_ttl = min(ttl, settings.cache_l1_max_ttl)
        async with self.l1_lock:
            self.l1_cache.setdefault(key, data, ttl=l1_ttl)
        
        # L2: Redis cache (full TTL for persistence)
        try:
            await self.redis.setex(key, ttl, json.dumps(data, default=str))
        except Exception as e:
            logger.warning(f"Failed to populate Redis cache for {key}: {e}")
    
    async def invalidate(self, pattern: str):
        """Invalidate across both cache layers"""
        # L1: Clear matching keys from memory
        async with self.l1_lock:
            keys_to_remove = [k for k in self.l1_cache.keys() if fnmatch(k, pattern)]
            for key in keys_to_remove:
                del self.l1_cache[key]
                logger.debug(f"L1 invalidated: {key}")
        
        # L2: Clear from Redis and notify other instances
        redis_keys = await self.redis.keys(pattern)
        if redis_keys:
            await self.redis.delete(*redis_keys)
            # Publish invalidation event to other instances
            await self.redis.publish("cache:invalidate", pattern)
            logger.debug(f"L2 invalidated: {pattern} ({len(redis_keys)} keys)")
    
    def _start_background_tasks(self):
        """Start background maintenance tasks"""
        asyncio.create_task(self._listen_for_invalidations())
        asyncio.create_task(self._memory_pressure_monitor())
        asyncio.create_task(self._cache_stats_reporter())
```

**Performance Characteristics:**
```
Request Path                     | Latency | Use Case
─────────────────────────────────┼─────────┼─────────────────────
L1 Memory Hit                    | ~0.1ms  | Frequently accessed data
L2 Redis Hit → L1 Population     | ~2-5ms  | Recently accessed data  
Cache Miss → DB → L1+L2 Populate | ~50-200ms| First access or expired
```

---

### 🔧 Enhanced Configuration

```python
class CacheConfig:
    # === L1 (Memory) Cache Settings ===
    l1_cache_enabled: bool = True           # Enable memory caching
    l1_cache_size: int = 10000             # Max items in memory per instance
    l1_cache_ttl: int = 60                 # Default L1 TTL (seconds)
    l1_max_ttl: int = 300                  # Maximum L1 TTL (memory limit)
    l1_memory_limit_mb: int = 100          # Max memory usage for L1 cache
    
    # === L2 (Redis) Cache Settings ===
    l2_cache_enabled: bool = True           # Enable Redis caching
    l2_connection_pool_size: int = 20       # Redis connection pool
    l2_timeout_ms: int = 100               # Redis operation timeout
    
    # === Entity-specific TTLs ===
    tools_cache_ttl: int = 300             # 5 minutes (L2 TTL)
    tools_cache_l1_ttl: int = 60           # 1 minute (L1 TTL)
    
    servers_cache_ttl: int = 180           # 3 minutes (L2 TTL) 
    servers_cache_l1_ttl: int = 45         # 45 seconds (L1 TTL)
    
    federation_cache_ttl: int = 300        # 5 minutes (L2 TTL)
    federation_cache_l1_ttl: int = 60      # 1 minute (L1 TTL)
    
    # === Cache Behavior ===
    cache_on_write: bool = True            # Populate on creates/updates
    l1_l2_sync_enabled: bool = True        # Sync L1 across instances via Redis
    memory_pressure_threshold: float = 0.8 # Evict L1 at 80% memory limit
```

---

### 🚀 **Updated Performance Expectations**

**Before Caching:**
```
GET /v1/tools (database query):     150ms avg
GET /v1/servers (with joins):       250ms avg  
Federation call:                    400ms avg
```

**After L2 (Redis) Only:**
```
GET /v1/tools (Redis hit):          5ms avg
GET /v1/servers (Redis hit):        7ms avg
Federation call (cached):           3ms avg
```

**After L1+L2 (Memory+Redis):**
```
GET /v1/tools (L1 hit):             0.2ms avg  ⚡
GET /v1/servers (L1 hit):           0.3ms avg  ⚡
Federation call (L1 hit):           0.1ms avg  ⚡
```

**Cache Hit Distribution (after warmup):**
- L1 Memory hits: ~85% (most frequent data)
- L2 Redis hits: ~12% (less frequent but cached)  
- Cache misses: ~3% (new/expired data)

---

### 🧭 **Updated Tasks**

* **Phase 1: Enhanced Cache Infrastructure**

  * [ ] **Implement MultiLayerCacheManager** with both L1 and L2
    ```python
    # mcpgateway/cache/multilayer.py
    class MultiLayerCacheManager:
        def __init__(self, l1_config: L1Config, l2_config: L2Config)
        async def get_or_set(key, factory, l1_ttl, l2_ttl)
        async def invalidate_pattern(pattern)
        async def get_cache_stats() -> CacheStats
    ```

  * [ ] **Add memory monitoring and pressure management**
    ```python
    async def _memory_pressure_monitor(self):
        """Monitor memory usage and evict L1 cache when needed"""
        while True:
            memory_usage = self._get_l1_memory_usage()
            if memory_usage > self.config.memory_pressure_threshold:
                await self._evict_l1_lru_items(count=1000)
            await asyncio.sleep(30)
    ```

  * [ ] **Implement cross-instance L1 invalidation via Redis pub/sub**
    ```python
    async def _listen_for_invalidations(self):
        """Listen for invalidation events from other instances"""
        pubsub = self.redis.pubsub()
        await pubsub.subscribe("cache:invalidate")
        
        async for message in pubsub.listen():
            if message["type"] == "message":
                pattern = message["data"].decode()
                await self._invalidate_l1_pattern(pattern)
    ```

* **Phase 2: Service Integration with Dual TTLs**

  * [ ] **Update all services with L1+L2 aware caching**
    ```python
    @cached(
        key="mcp:tools:list:{include_inactive}",
        l1_ttl=60,    # 1 minute in memory
        l2_ttl=300    # 5 minutes in Redis
    )
    async def list_tools(self, db: Session, include_inactive: bool = False):
        return await self._fetch_tools_from_db(db, include_inactive)
    ```

  * [ ] **Implement smart cache warming for frequently accessed data**
    ```python
    async def warm_l1_cache(self):
        """Pre-populate L1 cache with hot data on startup"""
        hot_keys = [
            "mcp:tools:list:false",
            "mcp:servers:list:false", 
            "mcp:metrics:aggregated:1h"
        ]
        for key in hot_keys:
            await self.cache_manager.get_or_set(key, lambda: self._fetch_hot_data(key))
    ```

* **Phase 3: Memory Management & Optimization**

  * [ ] **Implement intelligent L1 eviction policies**
    ```python
    class L1EvictionManager:
        def __init__(self):
            self.access_tracker = defaultdict(int)
            self.size_tracker = defaultdict(int)
        
        async def evict_candidates(self, target_count: int) -> List[str]:
            """Return keys to evict based on LRU + size + access frequency"""
            # Prioritize large, infrequently accessed items
            candidates = sorted(
                self.l1_cache.keys(),
                key=lambda k: (
                    -self.access_tracker[k],  # Less frequently accessed first
                    -self.size_tracker[k]     # Larger items first
                )
            )
            return candidates[:target_count]
    ```

  * [ ] **Add L1 cache size monitoring and alerts**
  * [ ] **Implement configurable L1 cache partitioning by entity type**

* **Phase 4: Enhanced Monitoring**

  * [ ] **Comprehensive cache metrics with layer breakdown**
    ```python
    class DetailedCacheMetrics:
        # L1 metrics
        l1_hits: int = 0
        l1_misses: int = 0
        l1_evictions: int = 0
        l1_memory_usage_mb: float = 0
        
        # L2 metrics  
        l2_hits: int = 0
        l2_misses: int = 0
        l2_network_errors: int = 0
        
        # Performance metrics
        avg_l1_response_time_ms: float = 0.2
        avg_l2_response_time_ms: float = 3.5
        avg_db_response_time_ms: float = 150.0
        
        def overall_hit_ratio(self) -> float:
            total = self.l1_hits + self.l2_hits + self.l2_misses
            return (self.l1_hits + self.l2_hits) / total if total > 0 else 0.0
    ```

  * [ ] **Add cache layer visualization to Admin UI**
  * [ ] **Implement cache performance alerting**

---

### 🎯 **Updated Success Criteria**

- [ ] **L1 cache hit ratio > 80%** for frequently accessed endpoints after warmup
- [ ] **Combined L1+L2 hit ratio > 95%** overall
- [ ] **Sub-millisecond response times** for L1 cache hits  
- [ ] **L1 memory usage < 100MB** per container instance
- [ ] **Cross-instance cache invalidation < 100ms** propagation time
- [ ] **Cache system survives Redis outages** (L1 continues working)
- [ ] **Zero cache inconsistency** during normal operations

---

### 🧩 **Additional Notes**

* **🚀 Performance**: L1 cache provides **100-500x** faster access than Redis for hot data
* **💾 Memory Efficiency**: Smart eviction prevents memory bloat while maximizing hit rates
* **🔄 High Availability**: L1 cache continues working even if Redis is down
* **📊 Scalability**: Each container instance has its own L1 cache + shared L2 Redis
* **🔧 Operational**: Can disable L1 or L2 independently for debugging
* **📈 Monitoring**: Detailed metrics for both cache layers and performance optimization
* **⚡ Zero Network**: L1 hits require zero network calls (Redis or database)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Multi-Layer Caching System (Memory + Redis) #289

🧭 Epic - Multi-Layer Caching System (Memory + Redis)

🙋‍♂️ User Stories & Acceptance Criteria

Story 1 — Fast L1 Hit

Story 2 — L2 Redis Fallback & L1 Population

Story 3 — Cross-Instance Invalidation

Story 4 — Memory-Pressure Eviction

Story 5 — Redis Outage Resilience

Story 6 — Metrics & Observability

🖼️ Architecture

1 — Component & Data-flow

2 — `get_or_set()` Sequence

📐 Enhanced Design Sketch

🔧 Enhanced Configuration

🚀 Updated Performance Expectations

🧭 Updated Tasks

🎯 Updated Success Criteria

🧩 Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Multi-Layer Caching System (Memory + Redis) #289

Description

🧭 Epic - Multi-Layer Caching System (Memory + Redis)

🙋‍♂️ User Stories & Acceptance Criteria

Story 1 — Fast L1 Hit

Story 2 — L2 Redis Fallback & L1 Population

Story 3 — Cross-Instance Invalidation

Story 4 — Memory-Pressure Eviction

Story 5 — Redis Outage Resilience

Story 6 — Metrics & Observability

🖼️ Architecture

1 — Component & Data-flow

2 — get_or_set() Sequence

📐 Enhanced Design Sketch

🔧 Enhanced Configuration

🚀 Updated Performance Expectations

🧭 Updated Tasks

🎯 Updated Success Criteria

🧩 Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2 — `get_or_set()` Sequence