fix: keep memory cache metrics accurate #372

cryo-zd · 2025-10-08T15:04:37Z

What type of PR is this?

fix: keep memory cache metrics accurate

What this PR does / why we need it:

Context:
func (c *InMemoryCache) UpdateWithResponse runs c.cleanupExpiredEntries(), which can silently drop TTL-expired cache entries, but doesn't call metrics.UpdateCacheEntries("memory", len(c.entries))) to refresh CacheEntriesTotal afterword - leaving the gauge stale until another write(func (c *InMemoryCache) AddEntry/func (c *InMemoryCache) AddPendingRequest) happened.
Fix:

Centralized the cleanup so it now logs TTL removals and updates the gauge immediately, keeping metrics in sync whenever expired entries are purged.
Made Close explicitly zero the gauge after cleaning memory so shutdown reports the correct size.

semantic-router/src/semantic-router/pkg/cache/inmemory_cache.go

Lines 120 to 153 in 565c7ab

    
           // UpdateWithResponse completes a pending request by adding the response 
        
           func (c *InMemoryCache) UpdateWithResponse(requestID string, responseBody []byte) error { 
        
           	start := time.Now() 
        
           	if !c.enabled { 
        
           		return nil 
        
           	} 
        
           	c.mu.Lock() 
        
           	defer c.mu.Unlock() 
        
           	// Clean up expired entries during the update 
        
           	c.cleanupExpiredEntries() 
        
           	// Locate the pending request and complete it 
        
           	for i, entry := range c.entries { 
        
           		if entry.RequestID == requestID && entry.ResponseBody == nil { 
        
           			// Complete the cache entry with the response 
        
           			c.entries[i].ResponseBody = responseBody 
        
           			c.entries[i].Timestamp = time.Now() 
        
           			c.entries[i].LastAccessAt = time.Now() 
        
           			observability.Debugf("InMemoryCache.UpdateWithResponse: updated entry with response (response_size: %d bytes)", 
        
           				len(responseBody)) 
        
           			// Record successful completion 
        
           			metrics.RecordCacheOperation("memory", "update_response", "success", time.Since(start).Seconds()) 
        
           			return nil 
        
           		} 
        
           	} 
        
           	// No matching pending request found 
        
           	metrics.RecordCacheOperation("memory", "update_response", "error", time.Since(start).Seconds()) 
        
           	return fmt.Errorf("no pending request found for request ID: %s", requestID) 
        
           }

semantic-router/src/semantic-router/pkg/metrics/metrics.go

Lines 511 to 514 in 565c7ab

    
           // UpdateCacheEntries updates the current number of cache entries for a backend 
        
           func UpdateCacheEntries(backend string, count int) { 
        
           	CacheEntriesTotal.WithLabelValues(backend).Set(float64(count)) 
        
           }

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

Signed-off-by: cryo <[email protected]>

netlify · 2025-10-08T15:04:42Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`b081e77`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e6963d24934b0008333241
😎 Deploy Preview	https://deploy-preview-372--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-08T15:04:49Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/go.mod
src/semantic-router/pkg/cache/cache_test.go
src/semantic-router/pkg/cache/inmemory_cache.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-08T15:25:10Z

@cryo-zd can you add a unit test for this case?

cryo-zd · 2025-10-08T16:21:03Z

@cryo-zd can you add a unit test for this case?

My oversight, I will add a test for it.

Signed-off-by: cryo <[email protected]>

fix: keep memory cache metrics accurate

2c5ee46

Signed-off-by: cryo <[email protected]>

cryo-zd requested review from rootfs, Xunzhuo and wangchen615 as code owners October 8, 2025 15:04

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Oct 8, 2025

cryo-zd and others added 2 commits October 8, 2025 16:40

test: add test for metrics fix for UpdateWithResponse

c93d0fd

Signed-off-by: cryo <[email protected]>

Merge branch 'main' into fix/metrics

b081e77

rootfs merged commit 14f34cd into vllm-project:main Oct 8, 2025
9 checks passed

cryo-zd deleted the fix/metrics branch October 8, 2025 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: keep memory cache metrics accurate #372

fix: keep memory cache metrics accurate #372

Uh oh!

cryo-zd commented Oct 8, 2025

Uh oh!

netlify bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

rootfs commented Oct 8, 2025

Uh oh!

cryo-zd commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

	// UpdateWithResponse completes a pending request by adding the response
	func (c *InMemoryCache) UpdateWithResponse(requestID string, responseBody []byte) error {
	start := time.Now()

	if !c.enabled {
	return nil
	}

	c.mu.Lock()
	defer c.mu.Unlock()

	// Clean up expired entries during the update
	c.cleanupExpiredEntries()

	// Locate the pending request and complete it
	for i, entry := range c.entries {
	if entry.RequestID == requestID && entry.ResponseBody == nil {
	// Complete the cache entry with the response
	c.entries[i].ResponseBody = responseBody
	c.entries[i].Timestamp = time.Now()
	c.entries[i].LastAccessAt = time.Now()
	observability.Debugf("InMemoryCache.UpdateWithResponse: updated entry with response (response_size: %d bytes)",
	len(responseBody))

	// Record successful completion
	metrics.RecordCacheOperation("memory", "update_response", "success", time.Since(start).Seconds())
	return nil
	}
	}

	// No matching pending request found
	metrics.RecordCacheOperation("memory", "update_response", "error", time.Since(start).Seconds())
	return fmt.Errorf("no pending request found for request ID: %s", requestID)
	}

	// UpdateCacheEntries updates the current number of cache entries for a backend
	func UpdateCacheEntries(backend string, count int) {
	CacheEntriesTotal.WithLabelValues(backend).Set(float64(count))
	}

fix: keep memory cache metrics accurate #372

fix: keep memory cache metrics accurate #372

Uh oh!

Conversation

cryo-zd commented Oct 8, 2025

What type of PR is this?

What this PR does / why we need it:

Uh oh!

netlify bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 8, 2025

Uh oh!

cryo-zd commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

netlify bot commented Oct 8, 2025 •

edited

Loading

github-actions bot commented Oct 8, 2025 •

edited

Loading

📁 `src`

cryo-zd commented Oct 8, 2025 •

edited

Loading