-
Notifications
You must be signed in to change notification settings - Fork 238
Docs: Add integration proposal for PS and SR #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Rui Zhang <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
||
**Tasks:** | ||
|
||
1. **Distributed Tracing (OpenTelemetry):** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the otel tracing is enabled in vsr. An e2e tracing demo is in #329. If we can have an app -> envoy -> vsr -> PS -> vllm tracing, that'll be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cool
Signed-off-by: Rui Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This proposal document outlines the integration strategy between vLLM Semantic Router and vLLM Production Stack to create a unified inference system with semantic intelligence and infrastructure optimization capabilities.
- Introduces a comprehensive integration framework combining semantic routing capabilities with production-scale infrastructure
- Details a four-layer system architecture with gateway, semantic intelligence, infrastructure optimization, and execution layers
- Proposes a phased implementation plan with foundation, observability, and production hardening stages
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
#### Semantic Router – Request Intelligence Layer | ||
|
||
* Understands the user’s intent via multi‑signal classification, combining keyword matching, embedding similarity and classification. |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase 'embedding similarity and classification' is redundant since classification is mentioned twice. Consider revising to 'combining keyword matching, embedding similarity, and LLM-based classification' or similar to clarify the different classification methods.
* Understands the user’s intent via multi‑signal classification, combining keyword matching, embedding similarity and classification. | |
* Understands the user’s intent via multi‑signal classification, combining keyword matching, embedding similarity, and LLM-based classification. |
Copilot uses AI. Check for mistakes.
@zerofishnoodles can you add this doc to sidebar? |
Signed-off-by: Rui Zhang <[email protected]>
Signed-off-by: Rui Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀🚀🚀
* doc: add integration proposal for PS and SR Signed-off-by: Rui Zhang <[email protected]> * change codespell Signed-off-by: Rui Zhang <[email protected]> * Add sidebar Signed-off-by: Rui Zhang <[email protected]> * modify Signed-off-by: Rui Zhang <[email protected]> --------- Signed-off-by: Rui Zhang <[email protected]> Signed-off-by: samzong <[email protected]>
What type of PR is this?
docs: Add integration proposal for PS and SR
What this PR does / why we need it:
Add integration proposal for PS and SR
Which issue(s) this PR fixes:
Related to #295
Release Notes: No