-
Notifications
You must be signed in to change notification settings - Fork 239
fix: stop returning expired in-memory cache hits #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: cryo <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@aeft PTAL, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where the in-memory cache could return expired entries. The fix ensures that FindSimilar
skips TTL-expired cache entries during the similarity search phase, preventing stale responses from being served.
Key changes:
- Refactored expiration checking logic into a reusable
isExpired
helper method - Added expiration checks during similarity search to skip expired entries before they're considered
- Removed the read-only cleanup method that only logged expired entries without preventing their use
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/semantic-router/pkg/cache/inmemory_cache.go | Added expiration check in FindSimilar to skip expired entries, refactored expiration logic into isExpired helper method, removed cleanupExpiredEntriesReadOnly |
src/semantic-router/pkg/cache/cache_test.go | Added test case verifying expired entries are not returned from similarity search |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
"ttl_seconds": c.ttlSeconds, | ||
}) | ||
} | ||
return now.Sub(entry.LastAccessAt) >= time.Duration(c.ttlSeconds)*time.Second |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion time.Duration(c.ttlSeconds)*time.Second
is computed on every call to isExpired
. Since ttlSeconds
is constant for the cache instance, consider precomputing this as a time.Duration
field (e.g., ttlDuration
) during cache initialization to avoid repeated multiplications.
Copilot uses AI. Check for mistakes.
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{ | ||
Enabled: true, | ||
SimilarityThreshold: 0.1, | ||
MaxEntries: 10, | ||
TTLSeconds: 1, | ||
}) | ||
defer ttlCache.Close() | ||
|
||
err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response")) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
time.Sleep(1100 * time.Millisecond) |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded sleep duration of 1100ms (1.1 seconds) relies on a magic number. Consider defining this as a constant like ttlWaitDuration = 1100 * time.Millisecond
or calculating it as time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond
to make the relationship to the TTL explicit.
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{ | |
Enabled: true, | |
SimilarityThreshold: 0.1, | |
MaxEntries: 10, | |
TTLSeconds: 1, | |
}) | |
defer ttlCache.Close() | |
err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response")) | |
Expect(err).NotTo(HaveOccurred()) | |
time.Sleep(1100 * time.Millisecond) | |
ttlSeconds := 1 | |
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{ | |
Enabled: true, | |
SimilarityThreshold: 0.1, | |
MaxEntries: 10, | |
TTLSeconds: ttlSeconds, | |
}) | |
defer ttlCache.Close() | |
err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response")) | |
Expect(err).NotTo(HaveOccurred()) | |
ttlWaitDuration := time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond | |
time.Sleep(ttlWaitDuration) |
Copilot uses AI. Check for mistakes.
Thanks! It looks good to me. cc @rootfs |
What type of PR is this?
fix: stop returning expired in-memory cache hits
What this PR does / why we need it:
This PR skips TTL-expired entries while scanning for the best match so
FindSimilar
doesn't serves stale responses.Which issue(s) this PR fixes:
Fixes #404
Release Notes: Yes/No