-
Notifications
You must be signed in to change notification settings - Fork 188
docs: add NVIDIA Dynamo integration proposal #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
0364191
to
3761959
Compare
Signed-off-by: bitliu <[email protected]>
be24cd5
to
5fabc12
Compare
dynamics, competitive landscape, and stakeholder interests in your recommendations. | ||
``` | ||
|
||
#### 2.2.2 Fusion Routing Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this integration depending on or can be continued by the prompt classification improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope i think it is not a blocker here.
|
||
Semantic Router implements a **multi-signal fusion routing** approach that combines three complementary routing methods (as detailed in the [Prompt Classification Routing proposal](./prompt-classification-routing.md)): | ||
|
||
**1. Keyword-Based Routing (Fast Path)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a potential for sub tasks in the integration.
| Dimension | Semantic Router Alone | Dynamo Router Alone | **Integrated System** | | ||
|-----------|----------------------|---------------------|----------------------| | ||
| **Model Selection** | ✅ Semantic accuracy (14 categories) | ❌ No model awareness | ✅ Best model for task | | ||
| **Worker Selection** | ❌ No worker awareness | ✅ KV cache optimization | ✅ Optimal worker for model | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think #227 can help both model and worker selection
|-----------|----------------------|---------------------|----------------------| | ||
| **Model Selection** | ✅ Semantic accuracy (14 categories) | ❌ No model awareness | ✅ Best model for task | | ||
| **Worker Selection** | ❌ No worker awareness | ✅ KV cache optimization | ✅ Optimal worker for model | | ||
| **Prompt Engineering** | ✅ Domain-aware system prompts | ❌ No prompt optimization | ✅ Optimized CoT & MoE matching | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially the system prompt injection could impact the prefix cache, we should also monitor that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, left some ideas for github issues
What type of PR is this?
docs: add NVIDIA Dynamo integration proposal