This document describes how data is stored in Redis for the semantic cache system.
Redis is used to store:
- Cached query responses (with embeddings)
- Cached embeddings (to avoid regenerating them)
- Cache key index (set of all cache entry keys)
- Statistics counters (request counts, cache hits/misses, etc.)
Key Pattern: cache:{sha256_hash_of_query}
Example: cache:a1b2c3d4e5f6... (64-char hex hash)
Value: JSON string containing:
{
"query_text": "What's the weather in New York?",
"embedding": [0.123, -0.456, 0.789, ...], // Array of floats
"response": "The weather in New York is...",
"created_at": "2024-01-17T12:34:56.789Z",
"ttl_seconds": 3600
}TTL: Set based on query type:
- Time-sensitive queries:
short_ttl_seconds(default: 600 = 10 minutes) - Evergreen queries:
long_ttl_seconds(default: 86400 = 24 hours)
Storage Method: SET key value EX ttl_seconds
Location: semantic_cache.py:84
Key Pattern: embed:{model_name}:{sha256_hash_of_normalized_query}
Example: embed:text-embedding-3-small:a1b2c3d4e5f6...
Value: JSON string containing array of floats:
[0.123, -0.456, 0.789, ...]TTL: embedding_cache_ttl_seconds (configurable, typically longer than response cache)
Storage Method: SET key value EX ttl_seconds
Purpose: Avoids regenerating embeddings for the same normalized query
Location: semantic_cache.py:41-44
Note: Query is normalized (lowercased, trimmed) before hashing to maximize cache hits.
Key: cache_keys
Type: Redis Set
Value: Set of all cache entry keys (e.g., {"cache:abc123...", "cache:def456...", ...})
Purpose:
- Maintains a list of all cached responses for similarity search
- Used in
find_similar()to iterate through all cached entries - Automatically cleaned up when entries expire (see cleanup logic below)
Operations:
- Add:
SADD cache_keys cache:{hash}when storing a new response - Remove:
SREM cache_keys cache:{hash}when entry expires (cleanup) - Read:
SMEMBERS cache_keysto get all keys for similarity search
Location: semantic_cache.py:53, 85, 57
Cleanup: When find_similar() encounters an expired key (returns None), it removes it from the set.
All statistics are stored as integer values using Redis string operations.
Key: stat:requests
- Type: Integer (stored as string)
- Operation:
INCR stat:requests - Purpose: Total number of API requests
- Location:
routes.py:102
Key: stat:cache_hits
- Type: Integer
- Operation:
INCR stat:cache_hits - Purpose: Number of cache hits
- Location:
routes.py:116
Key: stat:cache_misses
- Type: Integer
- Operation:
INCR stat:cache_misses - Purpose: Number of cache misses
- Location:
routes.py:122, 124
Key: stat:llm_fallbacks
- Type: Integer
- Operation:
INCR stat:llm_fallbacks - Purpose: Number of fallback responses (when LLM call limit reached)
- Location:
routes.py:135
Key: llm_call_count
- Type: Integer
- Operation:
INCR llm_call_count(only when actually making LLM call) - Purpose: Tracks total LLM API calls made (used for rate limiting)
- Location:
openai_client.py:66, 72 - Note: Checked before incrementing to enforce
max_llm_callslimit
- Query comes in → normalized → hashed
- Check embedding cache:
embed:{model}:{hash}- If exists: use cached embedding
- If not: generate embedding → store with TTL
- Search for similar cached responses (iterate
cache_keysset) - If no match above threshold:
- Call LLM → get response
- Store response:
SET cache:{hash} {json} EX ttl - Add to index:
SADD cache_keys cache:{hash}
- Update stats:
INCR stat:requests,INCR stat:cache_misses
- Query comes in → generate/get embedding
- Iterate
cache_keysset - For each key:
GET cache:{key}→ parse JSON- Compute cosine similarity with query embedding
- Track best match
- If best similarity ≥ threshold:
- Return cached response
- Update stats:
INCR stat:cache_hits
- If entry expired (GET returns None):
- Clean up:
SREM cache_keys {key}
- Clean up:
- Cache entries: Expire automatically via Redis TTL
- Embedding cache: Expires automatically via Redis TTL
- Cache keys set: Entries removed manually when expired entries are detected
- Statistics: Never expire (persist indefinitely)
- Algorithm: SHA-256
- Input: Query text (or normalized query for embeddings)
- Output: 64-character hexadecimal string
- Location:
semantic_cache.py:89-90
- All complex data (cache entries, embeddings) stored as JSON strings
- Redis client configured with
decode_responses=Truefor automatic string decoding - Manual
json.loads()/json.dumps()for serialization
After processing a few queries, Redis might contain:
KEYS *
cache:a1b2c3d4e5f6... (string, TTL: 3600)
cache:def456789abc... (string, TTL: 600)
embed:text-embedding-3-small:abc123... (string, TTL: 86400)
cache_keys (set: {"cache:a1b2c3d4e5f6...", "cache:def456789abc..."})
stat:requests (string: "5")
stat:cache_hits (string: "2")
stat:cache_misses (string: "3")
llm_call_count (string: "3")
-
Linear Scan:
find_similar()scans all cache keys linearly - O(n) complexity- Impact: Slow for large cache sizes
- Solution: Use Redis vector search or external vector DB
-
Set Cleanup: Expired keys only removed when encountered during search
- Impact:
cache_keysset may contain stale references - Solution: Periodic cleanup job or use Redis SCAN with TTL checks
- Impact:
-
Statistics Persistence: Stats never expire, could grow indefinitely
- Impact: Memory usage (minimal for counters)
- Solution: Optional TTL or periodic reset
- Cache entries and
cache_keysset can become inconsistent if Redis crashes between operations - Mitigated by cleanup logic in
find_similar()that removes stale keys
docker exec <redis-container> redis-cli KEYS "*"docker exec <redis-container> redis-cli GET "cache:a1b2c3d4e5f6..."docker exec <redis-container> redis-cli SMEMBERS "cache_keys"docker exec <redis-container> redis-cli GET "stat:requests"
docker exec <redis-container> redis-cli GET "stat:cache_hits"
docker exec <redis-container> redis-cli GET "llm_call_count"docker exec <redis-container> redis-cli DBSIZE./clear_redis.sh
# or
docker exec <redis-container> redis-cli FLUSHALL