Skip to content

Conversation

cryo-zd
Copy link
Contributor

@cryo-zd cryo-zd commented Oct 14, 2025

What type of PR is this?

fix: stop returning expired in-memory cache hits

What this PR does / why we need it:
This PR skips TTL-expired entries while scanning for the best match so FindSimilar doesn't serves stale responses.

Which issue(s) this PR fixes:

Fixes #404

Release Notes: Yes/No

Copy link

github-actions bot commented Oct 14, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/cache/cache_test.go
  • src/semantic-router/pkg/cache/inmemory_cache.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit e8f3ca5
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68eea5a8e4a16a00074555cb
😎 Deploy Preview https://deploy-preview-423--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rootfs
Copy link
Collaborator

rootfs commented Oct 14, 2025

@aeft PTAL, thanks

@rootfs rootfs requested a review from Copilot October 14, 2025 12:52
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where the in-memory cache could return expired entries. The fix ensures that FindSimilar skips TTL-expired cache entries during the similarity search phase, preventing stale responses from being served.

Key changes:

  • Refactored expiration checking logic into a reusable isExpired helper method
  • Added expiration checks during similarity search to skip expired entries before they're considered
  • Removed the read-only cleanup method that only logged expired entries without preventing their use

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/semantic-router/pkg/cache/inmemory_cache.go Added expiration check in FindSimilar to skip expired entries, refactored expiration logic into isExpired helper method, removed cleanupExpiredEntriesReadOnly
src/semantic-router/pkg/cache/cache_test.go Added test case verifying expired entries are not returned from similarity search

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"ttl_seconds": c.ttlSeconds,
})
}
return now.Sub(entry.LastAccessAt) >= time.Duration(c.ttlSeconds)*time.Second
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion time.Duration(c.ttlSeconds)*time.Second is computed on every call to isExpired. Since ttlSeconds is constant for the cache instance, consider precomputing this as a time.Duration field (e.g., ttlDuration) during cache initialization to avoid repeated multiplications.

Copilot uses AI. Check for mistakes.

Comment on lines +580 to +591
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{
Enabled: true,
SimilarityThreshold: 0.1,
MaxEntries: 10,
TTLSeconds: 1,
})
defer ttlCache.Close()

err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))
Expect(err).NotTo(HaveOccurred())

time.Sleep(1100 * time.Millisecond)
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded sleep duration of 1100ms (1.1 seconds) relies on a magic number. Consider defining this as a constant like ttlWaitDuration = 1100 * time.Millisecond or calculating it as time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond to make the relationship to the TTL explicit.

Suggested change
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{
Enabled: true,
SimilarityThreshold: 0.1,
MaxEntries: 10,
TTLSeconds: 1,
})
defer ttlCache.Close()
err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))
Expect(err).NotTo(HaveOccurred())
time.Sleep(1100 * time.Millisecond)
ttlSeconds := 1
ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{
Enabled: true,
SimilarityThreshold: 0.1,
MaxEntries: 10,
TTLSeconds: ttlSeconds,
})
defer ttlCache.Close()
err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))
Expect(err).NotTo(HaveOccurred())
ttlWaitDuration := time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond
time.Sleep(ttlWaitDuration)

Copilot uses AI. Check for mistakes.

@aeft
Copy link
Contributor

aeft commented Oct 14, 2025

Thanks! It looks good to me. cc @rootfs

@rootfs rootfs merged commit 8b58623 into vllm-project:main Oct 14, 2025
16 checks passed
@cryo-zd cryo-zd deleted the fix/expiry branch October 14, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TTL expiry ignored when in-memory cache hits

5 participants