|
| 1 | +--- |
| 2 | +id: gateway-integrations |
| 3 | +title: Gateway Integrations |
| 4 | +sidebar_label: Gateway Integrations |
| 5 | +description: How the Semantic Router plugs into Envoy AI Gateway, Istio, AIBrix, LLM-D, and the vLLM Production Stack, plus what each integration adds. |
| 6 | +--- |
| 7 | + |
| 8 | +The Semantic Router ships with multiple gateway profiles. This page shows **Which gateway plugs in**, **What SR adds**, and **Whatās already validated**. |
| 9 | + |
| 10 | +## High-level topology |
| 11 | + |
| 12 | +import ZoomableMermaid from '@site/src/components/ZoomableMermaid'; |
| 13 | + |
| 14 | +<ZoomableMermaid title="System Architecture Overview" defaultZoom={5.5}> |
| 15 | +{` |
| 16 | +flowchart LR |
| 17 | + C[Client / SDK] |
| 18 | + GW["Gateway<br/>(Envoy | Istio | AIBrix | LLM-D | Prod Stack)"] |
| 19 | + SR["Semantic Router<br/>(ExtProc gRPC)"] |
| 20 | + SC["Semantic Cache<br/>(Milvus)"] |
| 21 | + OBS["Telemetry<br/>(OTel ā Prom/Grafana)"] |
| 22 | + B1["Cloud LLMs<br/>(OpenAI, Anthropic, ...)"] |
| 23 | + B2["Self-hosted<br/>vLLM workers"] |
| 24 | + |
| 25 | + C --> GW |
| 26 | + GW -- ExtProc <br/> Inference Extension --> SR |
| 27 | + SR -->|headers: model, safety| GW |
| 28 | + SR --> SC |
| 29 | + SR --> OBS |
| 30 | + GW --> B1 |
| 31 | + GW --> B2 |
| 32 | + B1 --> OBS |
| 33 | + B2 --> OBS |
| 34 | + |
| 35 | + style SR fill:#1f2937,stroke:#0ea5e9,stroke-width:2,color:#e5e7eb |
| 36 | + style GW fill:#0f172a,stroke:#a855f7,stroke-width:2,color:#e5e7eb |
| 37 | +`} |
| 38 | +</ZoomableMermaid> |
| 39 | + |
| 40 | +## Supported Profiles |
| 41 | + |
| 42 | +| Gateway profile | Integration path | SR adds | CI status | Manifests / config | |
| 43 | +| -------------------- | ------------------------------------------ | ----------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | |
| 44 | +| **Envoy AI Gateway** | ExtProc gRPC (Envoy AI Gateway ā SR) | Classification ā model header, PII/jailbreak, semantic cache, observability headers | [](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-k8s.yml) <br /> [](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-helm.yml) | [`deploy/kubernetes/ai-gateway`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/ai-gateway) | |
| 45 | +| **Istio Gateway** | Gateway API Inference Extension + ExtProc | Same as above; demo with dual vLLM backends | Manual guide | [`deploy/kubernetes/istio`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/istio) | |
| 46 | +| **AIBrix Gateway** | Envoy Gateway API resources + ExtProc | SR intelligence in front of AIBrix autoscaler and distributed KV | Helm + AIBrix manifests; <br /> follows Envoy ExtProc; <br /> Planned E2E | [`deploy/kubernetes/aibrix`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/aibrix) | |
| 47 | +| **LLM-D Gateway** | Istio Gateway + LLM-D schedulers + ExtProc | Semantic routing feeds pool selection in LLM-D | Covered by Istio flow; <br /> Planned E2E | [`deploy/kubernetes/llmd-base`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/llmd-base) | |
| 48 | + |
| 49 | +> **Reading map**: pick your gateway, open the install guide, then jump to the manifests to see the exact resources the diagram refers to. |
| 50 | +
|
| 51 | +## Request Flow |
| 52 | + |
| 53 | +<ZoomableMermaid title="System Architecture Overview" defaultZoom={5.5}> |
| 54 | +{` |
| 55 | +sequenceDiagram |
| 56 | + autonumber |
| 57 | + participant Client |
| 58 | + participant Gateway |
| 59 | + participant SR as Semantic Router |
| 60 | + participant Cache as Semantic Cache |
| 61 | + participant Upstream as LLM Backends |
| 62 | + |
| 63 | + Client->>Gateway: OpenAI-compatible request |
| 64 | + Gateway->>SR: ExtProc gRPC (headers/body) |
| 65 | + SR->>SR: PII / jailbreak / category classification |
| 66 | + SR->>Cache: Semantic lookup |
| 67 | + alt cache hit |
| 68 | + SR-->>Gateway: Headers + cached response |
| 69 | + else miss |
| 70 | + SR-->>Gateway: Route headers (model, policy flags) |
| 71 | + Gateway->>Upstream: Forward to chosen backend |
| 72 | + Upstream-->>Gateway: LLM response |
| 73 | + Gateway-->>SR: Response headers/body (optional) |
| 74 | + SR->>Cache: Write entry |
| 75 | + end |
| 76 | + Gateway-->>Client: Final response |
| 77 | +`} |
| 78 | +</ZoomableMermaid> |
| 79 | + |
| 80 | +## Where to go next |
| 81 | + |
| 82 | +- **Envoy AI Gateway install**: [installation/k8s/ai-gateway](../../installation/k8s/ai-gateway) |
| 83 | +- **Istio Gateway install**: [installation/k8s/istio](../../installation/k8s/istio) |
| 84 | +- **AIBrix Gateway install**: [installation/k8s/aibrix](../../installation/k8s/aibrix) |
| 85 | +- **LLM-D Gateway install**: [installation/k8s/llm-d](../../installation/k8s/llm-d) |
0 commit comments