-
Notifications
You must be signed in to change notification settings - Fork 180
Open
Labels
area/docsarea/environmentarea/user-experiencepriority/P1Important / Should-HaveImportant / Should-Have
Milestone
Description
Description
The vLLM Semantic Router can classify requests and route them to a backend. At the same time, users deploy the vLLM Production Stack, which already provides features like LMCache integration, PD/KV-aware/prefix routing, LoRA adapter management, and observability, has no straightforward way to connect the Semantic Router with the Production Stack so that semantic classification can feed directly into production-grade execution.
Solution
Support a native integration path with the vLLM Production Stack.
- Keep Envoy + extProc–based classification in the Semantic Router (model routing, jailbreak/PII detection).
- After classification, modify the request with routing metadata (e.g., target_model)
- Forward the enriched request to the Production Stack router service, which then applies its original features.
Describe alternatives you've considered
Treat semantic router as an additional routing logic, plug in the semantic-aware routing. This requires changes to decouple the envoy and actual semantic routing logic.
XunzhuoXunzhuo
Metadata
Metadata
Assignees
Labels
area/docsarea/environmentarea/user-experiencepriority/P1Important / Should-HaveImportant / Should-Have