fix: avoid double counting cache hits #177

cryo-zd · 2025-09-19T15:36:29Z

What type of PR is this?

fix: avoid double counting cache hits

Currently, handleCaching calls metrics.RecordCacheHit() when a cache hit is found, while both InMemoryCache.FindSimilar and MilvusCache.FindSimilar already record cache hits internally. This causes a single cache hit to be counted twice.
This PR removes the duplicate call in handleCaching so that each hit is counted only once.
[InMemoryCache]

semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go

Lines 294 to 304 in fac50b1

    
           observability.Debugf("InMemoryCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes", 
        
           	results[0].Similarity, c.similarityThreshold, len(results[0].Entry.ResponseBody)) 
        
           observability.LogEvent("cache_hit", map[string]interface{}{ 
        
           	"backend":    "memory", 
        
           	"similarity": results[0].Similarity, 
        
           	"threshold":  c.similarityThreshold, 
        
           	"model":      model, 
        
           }) 
        
           metrics.RecordCacheOperation("memory", "find_similar", "hit", time.Since(start).Seconds()) 
        
           metrics.RecordCacheHit() 
        
           return results[0].Entry.ResponseBody, true, nil

[Milvus]

semantic-router/src/semantic-router/pkg/cache/milvus_cache.go

Lines 598 to 609 in fac50b1

    
           observability.Debugf("MilvusCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes", 
        
           	bestSimilarity, c.similarityThreshold, len(bestResponse)) 
        
           observability.LogEvent("cache_hit", map[string]interface{}{ 
        
           	"backend":    "milvus", 
        
           	"similarity": bestSimilarity, 
        
           	"threshold":  c.similarityThreshold, 
        
           	"model":      model, 
        
           	"collection": c.collectionName, 
        
           }) 
        
           metrics.RecordCacheOperation("milvus", "find_similar", "hit", time.Since(start).Seconds()) 
        
           metrics.RecordCacheHit() 
        
           return []byte(bestResponse), true, nil

What this PR does / why we need it:
Accurate cache hit metrics are important for monitoring and analysis.

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

Signed-off-by: cryo <[email protected]>

netlify · 2025-09-19T15:36:35Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`23b07c5`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68cd78813d3ac50008e360e0
😎 Deploy Preview	https://deploy-preview-177--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-09-19T15:36:43Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/extproc/request_handler.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-09-19T16:08:36Z

@cryo-zd would it ok if we keep the request handler code but remove the cache hit from inmemory and milvus? So when yet another storage is added, the cache hit is already counted.

cryo-zd · 2025-09-19T16:37:25Z

@cryo-zd would it ok if we keep the request handler code but remove the cache hit from inmemory and milvus? So when yet another storage is added, the cache hit is already counted.

Ack, keeping the cache hit metric in the request handler and letting each backend only record its own RecordCacheOperation makes the design cleaner and more extensible.

One small thought: right now only hits would be counted in the handler, while misses are still counted in each backend. To keep the logic symmetric and easier to maintain, would it make sense to also move RecordCacheMiss into the handler (e.g., in the else branch)? That way:

handler is the single place for global hit/miss counters,
backends focus only on RecordCacheOperation(...) for per-backend stats.

semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go

Lines 276 to 278 in aa6c22f

    
           metrics.RecordCacheOperation("memory", "find_similar", "miss", time.Since(start).Seconds()) 
        
           metrics.RecordCacheMiss() 
        
           return nil, false, nil

semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go

Lines 302 to 304 in aa6c22f

    
           metrics.RecordCacheOperation("memory", "find_similar", "hit", time.Since(start).Seconds()) 
        
           metrics.RecordCacheHit() 
        
           return results[0].Entry.ResponseBody, true, nil

I can update the PR in either direction — do you think this adjustment would be preferable?

rootfs · 2025-09-19T16:43:55Z

I see, that makes sense. I'll merge this one then.

rootfs · 2025-09-19T16:44:58Z

@cryo-zd do you like to come up with a doc PR about how to add yet another cache storage (e.g. chroma, redis)?

Also cc @aeft

Signed-off-by: cryo <[email protected]>

fix: avoid double counting cache hits

23b07c5

Signed-off-by: cryo <[email protected]>

cryo-zd requested review from Xunzhuo, rootfs and wangchen615 as code owners September 19, 2025 15:36

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Sep 19, 2025

rootfs merged commit 3f48e37 into vllm-project:main Sep 19, 2025
9 checks passed

cryo-zd deleted the fix/cachehit branch September 20, 2025 16:06

yossiovadia pushed a commit to yossiovadia/semantic-router that referenced this pull request Sep 22, 2025

fix: avoid double counting cache hits (vllm-project#177)

afd9cb8

Signed-off-by: cryo <[email protected]>

yossiovadia pushed a commit to yossiovadia/semantic-router that referenced this pull request Oct 8, 2025

fix: avoid double counting cache hits (vllm-project#177)

6e6b313

Signed-off-by: cryo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: avoid double counting cache hits #177

fix: avoid double counting cache hits #177

Uh oh!

cryo-zd commented Sep 19, 2025

Uh oh!

netlify bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 19, 2025

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

cryo-zd commented Sep 19, 2025

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	observability.Debugf("InMemoryCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes",
	results[0].Similarity, c.similarityThreshold, len(results[0].Entry.ResponseBody))
	observability.LogEvent("cache_hit", map[string]interface{}{
	"backend": "memory",
	"similarity": results[0].Similarity,
	"threshold": c.similarityThreshold,
	"model": model,
	})
	metrics.RecordCacheOperation("memory", "find_similar", "hit", time.Since(start).Seconds())
	metrics.RecordCacheHit()
	return results[0].Entry.ResponseBody, true, nil

	observability.Debugf("MilvusCache.FindSimilar: CACHE HIT - similarity=%.4f >= threshold=%.4f, response_size=%d bytes",
	bestSimilarity, c.similarityThreshold, len(bestResponse))
	observability.LogEvent("cache_hit", map[string]interface{}{
	"backend": "milvus",
	"similarity": bestSimilarity,
	"threshold": c.similarityThreshold,
	"model": model,
	"collection": c.collectionName,
	})
	metrics.RecordCacheOperation("milvus", "find_similar", "hit", time.Since(start).Seconds())
	metrics.RecordCacheHit()
	return []byte(bestResponse), true, nil

fix: avoid double counting cache hits #177

fix: avoid double counting cache hits #177

Uh oh!

Conversation

cryo-zd commented Sep 19, 2025

Uh oh!

netlify bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Sep 19, 2025

👥 vLLM Semantic Team Notification

📁 src

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

cryo-zd commented Sep 19, 2025

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Sep 19, 2025 •

edited

Loading

📁 `src`