Skip to content

Commit 92d2ec9

Browse files
committed
fix: stricter semantic cache defaults for reliability
- Increase default similarity_threshold from 0.95 to 0.98 - Increase default min_text_length from 50 to 128 chars - Make min_text_length configurable via CacheConfig These changes ensure semantic caching only activates for longer, substantive queries with near-identical content.
1 parent 6323a97 commit 92d2ec9

File tree

2 files changed

+13
-6
lines changed

2 files changed

+13
-6
lines changed

CHANGELOG.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,16 @@ Fixed critical issues with the semantic cache that caused incorrect cache matche
1212

1313
1. **System Prompt Hash Matching**: The semantic cache now includes a hash of the system prompt when matching cached responses. Previously, different LLM operations with similar user messages but different system prompts could incorrectly return cached responses from unrelated operations.
1414

15-
2. **Short Text Exclusion**: Messages shorter than 50 characters are now excluded from semantic matching. Short questions like "what about X?" and "what is Y?" have misleadingly high semantic similarity scores which caused false cache hits. These short messages still benefit from exact hash matching.
15+
2. **Short Text Exclusion**: Messages shorter than 128 characters are now excluded from semantic matching (configurable via `min_text_length`). Short questions have misleadingly high semantic similarity scores which caused false cache hits. These short messages still benefit from exact hash matching.
16+
17+
3. **Stricter Default Threshold**: Default similarity threshold increased from 0.95 to 0.98 for more reliable matching.
1618

1719
### Changes
1820

1921
- Added `_extract_system_hash()` method to compute SHA256 hash of system prompt content
2022
- Modified `_semantic_search()` to require both semantic similarity AND system hash match
21-
- Added minimum text length check (50 chars) before semantic cache operations
23+
- Added configurable `min_text_length` parameter (default: 128 chars) before semantic cache operations
24+
- Changed default `similarity_threshold` from 0.95 to 0.98
2225
- Added `caching` parameter to `ChatCompletion.create/acreate` for per-call cache bypass
2326

2427
---

onellm/cache.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,29 +40,33 @@ class CacheConfig:
4040
def __init__(
4141
self,
4242
max_entries: int = 1000,
43-
similarity_threshold: float = 0.95,
43+
similarity_threshold: float = 0.98,
4444
hash_only: bool = False,
4545
stream_chunk_strategy: str = "words",
4646
stream_chunk_length: int = 8,
4747
ttl: int = 86400,
48+
min_text_length: int = 128,
4849
):
4950
"""
5051
Initialize cache configuration.
5152
5253
Args:
5354
max_entries: Maximum number of cache entries before LRU eviction (default: 1000)
54-
similarity_threshold: Minimum similarity score for semantic cache hit (default: 0.95)
55+
similarity_threshold: Minimum similarity score for semantic cache hit (default: 0.98)
5556
hash_only: Disable semantic matching, use only hash-based exact matches (default: False)
5657
stream_chunk_strategy: How to chunk cached streaming responses (default: "words")
5758
stream_chunk_length: Number of strategy units per chunk (default: 8)
5859
ttl: Time-to-live in seconds for cache entries (default: 86400, 1 day)
60+
min_text_length: Minimum text length for semantic matching (default: 128).
61+
Short texts have misleadingly high similarity and skip semantic cache.
5962
"""
6063
self.max_entries = max_entries
6164
self.similarity_threshold = similarity_threshold
6265
self.hash_only = hash_only
6366
self.stream_chunk_strategy = stream_chunk_strategy
6467
self.stream_chunk_length = stream_chunk_length
6568
self.ttl = ttl
69+
self.min_text_length = min_text_length
6670

6771
# Validate strategy
6872
valid_strategies = {"words", "sentences", "paragraphs", "characters"}
@@ -300,7 +304,7 @@ def get(self, model: str, messages: list[dict], **kwargs) -> dict | None:
300304
system_hash = self._extract_system_hash(messages)
301305
# Skip semantic search for short texts - they have misleadingly high similarity
302306
# Short questions like "what about X?" and "what is Y?" can match incorrectly
303-
if text and len(text) >= 50:
307+
if text and len(text) >= self.config.min_text_length:
304308
result = self._semantic_search(text, system_hash)
305309
if result is not None:
306310
self.hits += 1
@@ -353,7 +357,7 @@ def set(self, model: str, messages: list[dict], response: dict, **kwargs):
353357
text = self._extract_text(messages)
354358
system_hash = self._extract_system_hash(messages)
355359
# Skip semantic indexing for short texts - they cause false matches
356-
if text and len(text) >= 50:
360+
if text and len(text) >= self.config.min_text_length:
357361
try:
358362
# Generate embedding
359363
import numpy as np

0 commit comments

Comments
 (0)