Skip to content

Commit f6fcb8a

Browse files
committed
šŸ“ docs(architecture): add gateway integrations overview and sidebar entries
Signed-off-by: samzong <[email protected]>
1 parent c3ce62e commit f6fcb8a

File tree

2 files changed

+86
-0
lines changed

2 files changed

+86
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
id: gateway-integrations
3+
title: Gateway Integrations
4+
sidebar_label: Gateway Integrations
5+
description: How the Semantic Router plugs into Envoy AI Gateway, Istio, AIBrix, LLM-D, and the vLLM Production Stack, plus what each integration adds.
6+
---
7+
8+
The Semantic Router ships with multiple gateway profiles. This page shows **Which gateway plugs in**, **What SR adds**, and **What’s already validated**.
9+
10+
## High-level topology
11+
12+
import ZoomableMermaid from '@site/src/components/ZoomableMermaid';
13+
14+
<ZoomableMermaid title="System Architecture Overview" defaultZoom={5.5}>
15+
{`
16+
flowchart LR
17+
C[Client / SDK]
18+
GW["Gateway<br/>(Envoy | Istio | AIBrix | LLM-D | Prod Stack)"]
19+
SR["Semantic Router<br/>(ExtProc gRPC)"]
20+
SC["Semantic Cache<br/>(Milvus)"]
21+
OBS["Telemetry<br/>(OTel → Prom/Grafana)"]
22+
B1["Cloud LLMs<br/>(OpenAI, Anthropic, ...)"]
23+
B2["Self-hosted<br/>vLLM workers"]
24+
25+
C --> GW
26+
GW -- ExtProc <br/> Inference Extension --> SR
27+
SR -->|headers: model, safety| GW
28+
SR --> SC
29+
SR --> OBS
30+
GW --> B1
31+
GW --> B2
32+
B1 --> OBS
33+
B2 --> OBS
34+
35+
style SR fill:#1f2937,stroke:#0ea5e9,stroke-width:2,color:#e5e7eb
36+
style GW fill:#0f172a,stroke:#a855f7,stroke-width:2,color:#e5e7eb
37+
`}
38+
</ZoomableMermaid>
39+
40+
## Supported Profiles
41+
42+
| Gateway profile | Integration path | SR adds | CI status | Manifests / config |
43+
| -------------------- | ------------------------------------------ | ----------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
44+
| **Envoy AI Gateway** | ExtProc gRPC (Envoy AI Gateway → SR) | Classification → model header, PII/jailbreak, semantic cache, observability headers | [![integration-test-k8s](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-k8s.yml/badge.svg)](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-k8s.yml) <br /> [![integration-test-helm](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-helm.yml/badge.svg)](https://github.com/vllm-project/semantic-router/actions/workflows/integration-test-helm.yml) | [`deploy/kubernetes/ai-gateway`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/ai-gateway) |
45+
| **Istio Gateway** | Gateway API Inference Extension + ExtProc | Same as above; demo with dual vLLM backends | Manual guide | [`deploy/kubernetes/istio`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/istio) |
46+
| **AIBrix Gateway** | Envoy Gateway API resources + ExtProc | SR intelligence in front of AIBrix autoscaler and distributed KV | Helm + AIBrix manifests; <br /> follows Envoy ExtProc; <br /> Planned E2E | [`deploy/kubernetes/aibrix`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/aibrix) |
47+
| **LLM-D Gateway** | Istio Gateway + LLM-D schedulers + ExtProc | Semantic routing feeds pool selection in LLM-D | Covered by Istio flow; <br /> Planned E2E | [`deploy/kubernetes/llmd-base`](https://github.com/vllm-project/semantic-router/tree/main/deploy/kubernetes/llmd-base) |
48+
49+
> **Reading map**: pick your gateway, open the install guide, then jump to the manifests to see the exact resources the diagram refers to.
50+
51+
## Request Flow
52+
53+
<ZoomableMermaid title="System Architecture Overview" defaultZoom={5.5}>
54+
{`
55+
sequenceDiagram
56+
autonumber
57+
participant Client
58+
participant Gateway
59+
participant SR as Semantic Router
60+
participant Cache as Semantic Cache
61+
participant Upstream as LLM Backends
62+
63+
Client->>Gateway: OpenAI-compatible request
64+
Gateway->>SR: ExtProc gRPC (headers/body)
65+
SR->>SR: PII / jailbreak / category classification
66+
SR->>Cache: Semantic lookup
67+
alt cache hit
68+
SR-->>Gateway: Headers + cached response
69+
else miss
70+
SR-->>Gateway: Route headers (model, policy flags)
71+
Gateway->>Upstream: Forward to chosen backend
72+
Upstream-->>Gateway: LLM response
73+
Gateway-->>SR: Response headers/body (optional)
74+
SR->>Cache: Write entry
75+
end
76+
Gateway-->>Client: Final response
77+
`}
78+
</ZoomableMermaid>
79+
80+
## Where to go next
81+
82+
- **Envoy AI Gateway install**: [installation/k8s/ai-gateway](../../installation/k8s/ai-gateway)
83+
- **Istio Gateway install**: [installation/k8s/istio](../../installation/k8s/istio)
84+
- **AIBrix Gateway install**: [installation/k8s/aibrix](../../installation/k8s/aibrix)
85+
- **LLM-D Gateway install**: [installation/k8s/llm-d](../../installation/k8s/llm-d)

ā€Žwebsite/sidebars.tsā€Ž

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ const sidebars: SidebarsConfig = {
2727
label: 'Architecture',
2828
items: [
2929
'overview/architecture/system-architecture',
30+
'overview/architecture/gateway-integrations',
3031
'overview/architecture/envoy-extproc',
3132
'overview/architecture/router-implementation',
3233
],

0 commit comments

Comments
Ā (0)