-
Notifications
You must be signed in to change notification settings - Fork 238
fix(cache): cleanup expired cache entries during update operations #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@QIN2DIM plz sign the commit, and repush that, thanks! |
63f2772
to
0f03714
Compare
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
docker/README.md
Outdated
git clone <repository-url> | ||
cd semantic_router | ||
git clone https://github.com/vllm-project/semantic-router.git | ||
cd semantic-router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u separate this to another PR, this is not related this the issue. Thanks!
@QIN2DIM can you sign?
|
28e098a
to
f2422b1
Compare
Signed-off-by: QIN2DIM <[email protected]>
Signed-off-by: QIN2DIM <[email protected]>
f2422b1
to
8e90e37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Would you like to fix another typo in docs found by you in another PR?
What type of PR is this?
fix(cache): cleanup expired cache entries during update operations
What this PR does / why we need it:
This PR fixes a memory leak in the semantic cache by ensuring expired entries are cleaned up during
UpdateWithResponse
operations. Previously, in high cache hit rate scenarios, expired entries could accumulate because:FindSimilar
(read operations) only logs expired entries but doesn't clean themAddPendingRequest
is called less frequently when cache hits are highUpdateWithResponse
was the primary write operation but didn't perform cleanupThe fix adds
cleanupExpiredEntries()
call toUpdateWithResponse
method, making it consistent with other write operations (AddPendingRequest
andAddEntry
) that already perform cleanup.Which issue(s) this PR fixes:
Release Notes: No