Skip to content

Conversation

@cryo-zd
Copy link
Contributor

@cryo-zd cryo-zd commented Sep 19, 2025

What type of PR is this?

fix: avoid double counting cache hits

Currently, handleCaching calls metrics.RecordCacheHit() when a cache hit is found, while both InMemoryCache.FindSimilar and MilvusCache.FindSimilar already record cache hits internally. This causes a single cache hit to be counted twice.
This PR removes the duplicate call in handleCaching so that each hit is counted only once.
[InMemoryCache]

observability.Debugf("InMemoryCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes",
results[0].Similarity, c.similarityThreshold, len(results[0].Entry.ResponseBody))
observability.LogEvent("cache_hit", map[string]interface{}{
"backend": "memory",
"similarity": results[0].Similarity,
"threshold": c.similarityThreshold,
"model": model,
})
metrics.RecordCacheOperation("memory", "find_similar", "hit", time.Since(start).Seconds())
metrics.RecordCacheHit()
return results[0].Entry.ResponseBody, true, nil

[Milvus]
observability.Debugf("MilvusCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes",
bestSimilarity, c.similarityThreshold, len(bestResponse))
observability.LogEvent("cache_hit", map[string]interface{}{
"backend": "milvus",
"similarity": bestSimilarity,
"threshold": c.similarityThreshold,
"model": model,
"collection": c.collectionName,
})
metrics.RecordCacheOperation("milvus", "find_similar", "hit", time.Since(start).Seconds())
metrics.RecordCacheHit()
return []byte(bestResponse), true, nil

What this PR does / why we need it:
Accurate cache hit metrics are important for monitoring and analysis.

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

@netlify
Copy link

netlify bot commented Sep 19, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 23b07c5
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68cd78813d3ac50008e360e0
😎 Deploy Preview https://deploy-preview-177--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/extproc/request_handler.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Sep 19, 2025

@cryo-zd would it ok if we keep the request handler code but remove the cache hit from inmemory and milvus? So when yet another storage is added, the cache hit is already counted.

@cryo-zd
Copy link
Contributor Author

cryo-zd commented Sep 19, 2025

@cryo-zd would it ok if we keep the request handler code but remove the cache hit from inmemory and milvus? So when yet another storage is added, the cache hit is already counted.

Ack, keeping the cache hit metric in the request handler and letting each backend only record its own RecordCacheOperation makes the design cleaner and more extensible.

One small thought: right now only hits would be counted in the handler, while misses are still counted in each backend. To keep the logic symmetric and easier to maintain, would it make sense to also move RecordCacheMiss into the handler (e.g., in the else branch)? That way:

  • handler is the single place for global hit/miss counters,
  • backends focus only on RecordCacheOperation(...) for per-backend stats.

metrics.RecordCacheOperation("memory", "find_similar", "miss", time.Since(start).Seconds())
metrics.RecordCacheMiss()
return nil, false, nil

metrics.RecordCacheOperation("memory", "find_similar", "hit", time.Since(start).Seconds())
metrics.RecordCacheHit()
return results[0].Entry.ResponseBody, true, nil

I can update the PR in either direction — do you think this adjustment would be preferable?

@rootfs
Copy link
Collaborator

rootfs commented Sep 19, 2025

I see, that makes sense. I'll merge this one then.

@rootfs rootfs merged commit 3f48e37 into vllm-project:main Sep 19, 2025
9 checks passed
@rootfs
Copy link
Collaborator

rootfs commented Sep 19, 2025

@cryo-zd do you like to come up with a doc PR about how to add yet another cache storage (e.g. chroma, redis)?

Also cc @aeft

@cryo-zd cryo-zd deleted the fix/cachehit branch September 20, 2025 16:06
yossiovadia pushed a commit to yossiovadia/semantic-router that referenced this pull request Sep 22, 2025
yossiovadia pushed a commit to yossiovadia/semantic-router that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants