[Feature/Integration] Add Integration Path with vLLM Production Stack

**Description**
The vLLM Semantic Router can classify requests and route them to a backend. At the same time, users deploy the vLLM Production Stack, which already provides features like LMCache integration, PD/KV-aware/prefix routing,  LoRA adapter management, and observability, has no straightforward way to connect the Semantic Router with the Production Stack so that semantic classification can feed directly into production-grade execution.

**Solution**
Support a native integration path with the vLLM Production Stack.
- Keep Envoy + extProc–based classification in the Semantic Router (model routing, jailbreak/PII detection).
- After classification, modify the request with routing metadata (e.g., target_model)
- Forward the enriched request to the Production Stack router service, which then applies its original features.

<p align="center">

<img width="500" height="500" alt="Image" src="https://github.com/user-attachments/assets/9bb7c66c-2591-4839-aeca-8275ddcfc746" />

</p>

**Describe alternatives you've considered**
Treat semantic router as an additional routing logic, plug in the semantic-aware routing. This requires changes to decouple the envoy and actual semantic routing logic. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature/Integration] Add Integration Path with vLLM Production Stack #295

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature/Integration] Add Integration Path with vLLM Production Stack #295

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions