-
Notifications
You must be signed in to change notification settings - Fork 276
fix: avoid double counting cache hits #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: cryo <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
@cryo-zd would it ok if we keep the request handler code but remove the cache hit from inmemory and milvus? So when yet another storage is added, the cache hit is already counted. |
Ack, keeping the cache hit metric in the request handler and letting each backend only record its own One small thought: right now only hits would be counted in the handler, while misses are still counted in each backend. To keep the logic symmetric and easier to maintain, would it make sense to also move
semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go Lines 276 to 278 in aa6c22f
semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go Lines 302 to 304 in aa6c22f
I can update the PR in either direction — do you think this adjustment would be preferable? |
|
I see, that makes sense. I'll merge this one then. |
Signed-off-by: cryo <[email protected]>
Signed-off-by: cryo <[email protected]>

What type of PR is this?
fix: avoid double counting cache hits
Currently,
handleCachingcallsmetrics.RecordCacheHit()when a cache hit is found, while bothInMemoryCache.FindSimilarandMilvusCache.FindSimilaralready record cache hits internally. This causes a single cache hit to be counted twice.This PR removes the duplicate call in
handleCachingso that each hit is counted only once.[InMemoryCache]
semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go
Lines 294 to 304 in fac50b1
[Milvus]
semantic-router/src/semantic-router/pkg/cache/milvus_cache.go
Lines 598 to 609 in fac50b1
What this PR does / why we need it:
Accurate cache hit metrics are important for monitoring and analysis.
Which issue(s) this PR fixes:
Fixes #
Release Notes: Yes/No