- 
                Notifications
    
You must be signed in to change notification settings  - Fork 277
 
feat: add milvus persistent storage support #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          ✅ Deploy Preview for vllm-semantic-router ready!
 To edit notification comments on pull requests, go to your Netlify project configuration.  | 
    
3eec72b    to
    9efcc06      
    Compare
  
    
          👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁 
 | 
    
| 
           CI failed due to missing running Milvus. For now, just skip these tests on CI  | 
    
| 
           @Xunzhuo No doc change in this PR. I'll add more doc on how to setup Milvus and inmemory caching in a following one.  | 
    
- Create CacheBackend interface with pluggable architecture - Refactor existing in-memory cache to implement new interface - Add cache factory pattern for backend selection - Support configurable similarity thresholds and TTL - Add comprehensive cache metrics and observability Addresses vllm-project#94 Signed-off-by: Huamin Chen <[email protected]>
- Implement MilvusCache backend with persistent storage - Add Milvus configuration file and connection management - Support vector similarity search with configurable indexing - Add TTL support and collection lifecycle management - Include Milvus dependencies and build configuration Addresses vllm-project#95 Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

What type of PR is this?
This is a WIP to support more persistent storage for semantic cache
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #94 #95
Release Notes: Yes/No