fix: stop returning expired in-memory cache hits #423

cryo-zd · 2025-10-14T10:10:13Z

What type of PR is this?

fix: stop returning expired in-memory cache hits

What this PR does / why we need it:
This PR skips TTL-expired entries while scanning for the best match so FindSimilar doesn't serves stale responses.

Which issue(s) this PR fixes:

Fixes #404

Release Notes: Yes/No

Signed-off-by: cryo <[email protected]>

github-actions · 2025-10-14T10:10:25Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/cache/cache_test.go
src/semantic-router/pkg/cache/inmemory_cache.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

netlify · 2025-10-14T10:10:34Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`e8f3ca5`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68eea5a8e4a16a00074555cb
😎 Deploy Preview	https://deploy-preview-423--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

rootfs · 2025-10-14T12:52:01Z

@aeft PTAL, thanks

Copilot

Pull Request Overview

This PR fixes a bug where the in-memory cache could return expired entries. The fix ensures that FindSimilar skips TTL-expired cache entries during the similarity search phase, preventing stale responses from being served.

Key changes:

Refactored expiration checking logic into a reusable isExpired helper method
Added expiration checks during similarity search to skip expired entries before they're considered
Removed the read-only cleanup method that only logged expired entries without preventing their use

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/semantic-router/pkg/cache/inmemory_cache.go	Added expiration check in FindSimilar to skip expired entries, refactored expiration logic into isExpired helper method, removed cleanupExpiredEntriesReadOnly
src/semantic-router/pkg/cache/cache_test.go	Added test case verifying expired entries are not returned from similarity search

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-14T12:52:57Z

src/semantic-router/pkg/cache/inmemory_cache.go

-			"ttl_seconds":   c.ttlSeconds,
-		})
-	}
+	return now.Sub(entry.LastAccessAt) >= time.Duration(c.ttlSeconds)*time.Second


The conversion time.Duration(c.ttlSeconds)*time.Second is computed on every call to isExpired. Since ttlSeconds is constant for the cache instance, consider precomputing this as a time.Duration field (e.g., ttlDuration) during cache initialization to avoid repeated multiplications.

Copilot · 2025-10-14T12:52:58Z

src/semantic-router/pkg/cache/cache_test.go

+			ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{
+				Enabled:             true,
+				SimilarityThreshold: 0.1,
+				MaxEntries:          10,
+				TTLSeconds:          1,
+			})
+			defer ttlCache.Close()
+
+			err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))
+			Expect(err).NotTo(HaveOccurred())
+
+			time.Sleep(1100 * time.Millisecond)


The hardcoded sleep duration of 1100ms (1.1 seconds) relies on a magic number. Consider defining this as a constant like ttlWaitDuration = 1100 * time.Millisecond or calculating it as time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond to make the relationship to the TTL explicit.

Suggested change

ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{

Enabled: true,

SimilarityThreshold: 0.1,

MaxEntries: 10,

TTLSeconds: 1,

})

defer ttlCache.Close()

err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))

Expect(err).NotTo(HaveOccurred())

time.Sleep(1100 * time.Millisecond)

ttlSeconds := 1

ttlCache := cache.NewInMemoryCache(cache.InMemoryCacheOptions{

Enabled: true,

SimilarityThreshold: 0.1,

MaxEntries: 10,

TTLSeconds: ttlSeconds,

})

defer ttlCache.Close()

err := ttlCache.AddEntry("ttl-request-id", "ttl-model", "time-sensitive query", []byte("request"), []byte("response"))

Expect(err).NotTo(HaveOccurred())

ttlWaitDuration := time.Duration(ttlSeconds)*time.Second + 100*time.Millisecond

time.Sleep(ttlWaitDuration)

aeft · 2025-10-14T22:24:24Z

Thanks! It looks good to me. cc @rootfs

fix: stop returning expired in-memory cache hits

3906c5b

Signed-off-by: cryo <[email protected]>

cryo-zd requested review from Xunzhuo, rootfs and wangchen615 as code owners October 14, 2025 10:10

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Oct 14, 2025

rootfs requested a review from Copilot October 14, 2025 12:52

Copilot AI reviewed Oct 14, 2025

View reviewed changes

rootfs added 2 commits October 14, 2025 13:57

Merge branch 'main' into fix/expiry

c0bdb6d

Merge branch 'main' into fix/expiry

e8f3ca5

rootfs merged commit 8b58623 into vllm-project:main Oct 14, 2025
16 checks passed

cryo-zd deleted the fix/expiry branch October 14, 2025 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop returning expired in-memory cache hits #423

fix: stop returning expired in-memory cache hits #423

Uh oh!

cryo-zd commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

rootfs commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 14, 2025

Uh oh!

Copilot AI Oct 14, 2025

Uh oh!

aeft commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: stop returning expired in-memory cache hits #423

fix: stop returning expired in-memory cache hits #423

Uh oh!

Conversation

cryo-zd commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

🎉 Thanks for your contributions!

Uh oh!

netlify bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

rootfs commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

aeft commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Oct 14, 2025 •

edited

Loading

📁 `src`

netlify bot commented Oct 14, 2025 •

edited

Loading