Skip to content

Conversation

cryo-zd
Copy link
Contributor

@cryo-zd cryo-zd commented Oct 8, 2025

What type of PR is this?

fix: keep memory cache metrics accurate

What this PR does / why we need it:

Context:
func (c *InMemoryCache) UpdateWithResponse runs c.cleanupExpiredEntries(), which can silently drop TTL-expired cache entries, but doesn't call metrics.UpdateCacheEntries("memory", len(c.entries))) to refresh CacheEntriesTotal afterword - leaving the gauge stale until another write(func (c *InMemoryCache) AddEntry/func (c *InMemoryCache) AddPendingRequest) happened.
Fix:

  • Centralized the cleanup so it now logs TTL removals and updates the gauge immediately, keeping metrics in sync whenever expired entries are purged.
  • Made Close explicitly zero the gauge after cleaning memory so shutdown reports the correct size.

// UpdateWithResponse completes a pending request by adding the response
func (c *InMemoryCache) UpdateWithResponse(requestID string, responseBody []byte) error {
start := time.Now()
if !c.enabled {
return nil
}
c.mu.Lock()
defer c.mu.Unlock()
// Clean up expired entries during the update
c.cleanupExpiredEntries()
// Locate the pending request and complete it
for i, entry := range c.entries {
if entry.RequestID == requestID && entry.ResponseBody == nil {
// Complete the cache entry with the response
c.entries[i].ResponseBody = responseBody
c.entries[i].Timestamp = time.Now()
c.entries[i].LastAccessAt = time.Now()
observability.Debugf("InMemoryCache.UpdateWithResponse: updated entry with response (response_size: %d bytes)",
len(responseBody))
// Record successful completion
metrics.RecordCacheOperation("memory", "update_response", "success", time.Since(start).Seconds())
return nil
}
}
// No matching pending request found
metrics.RecordCacheOperation("memory", "update_response", "error", time.Since(start).Seconds())
return fmt.Errorf("no pending request found for request ID: %s", requestID)
}

// UpdateCacheEntries updates the current number of cache entries for a backend
func UpdateCacheEntries(backend string, count int) {
CacheEntriesTotal.WithLabelValues(backend).Set(float64(count))
}

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

Copy link

netlify bot commented Oct 8, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit b081e77
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68e6963d24934b0008333241
😎 Deploy Preview https://deploy-preview-372--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Oct 8, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/go.mod
  • src/semantic-router/pkg/cache/cache_test.go
  • src/semantic-router/pkg/cache/inmemory_cache.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Oct 8, 2025

@cryo-zd can you add a unit test for this case?

@cryo-zd
Copy link
Contributor Author

cryo-zd commented Oct 8, 2025

@cryo-zd can you add a unit test for this case?

My oversight, I will add a test for it.

@rootfs rootfs merged commit 14f34cd into vllm-project:main Oct 8, 2025
9 checks passed
@cryo-zd cryo-zd deleted the fix/metrics branch October 8, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants