Skip to content

[Feature/Integration] Add Integration Path with vLLM Production Stack #295

@zerofishnoodles

Description

@zerofishnoodles

Description
The vLLM Semantic Router can classify requests and route them to a backend. At the same time, users deploy the vLLM Production Stack, which already provides features like LMCache integration, PD/KV-aware/prefix routing, LoRA adapter management, and observability, has no straightforward way to connect the Semantic Router with the Production Stack so that semantic classification can feed directly into production-grade execution.

Solution
Support a native integration path with the vLLM Production Stack.

  • Keep Envoy + extProc–based classification in the Semantic Router (model routing, jailbreak/PII detection).
  • After classification, modify the request with routing metadata (e.g., target_model)
  • Forward the enriched request to the Production Stack router service, which then applies its original features.

Image

Describe alternatives you've considered
Treat semantic router as an additional routing logic, plug in the semantic-aware routing. This requires changes to decouple the envoy and actual semantic routing logic.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions