Skip to content

Commit 04ee42d

Browse files
Merge branch 'main' into festive-sutherland
2 parents 5916c21 + 228e567 commit 04ee42d

File tree

2 files changed

+94
-67
lines changed

2 files changed

+94
-67
lines changed

docs/source/use_cases/semantic-router-integration.rst

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,29 @@ Identify the ClusterIP and port of your router Service:
7474
Step 2: Deploy vLLM Semantic Router
7575
------------------------------------
7676

77-
Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/docs/installation/kubernetes>`_ guide with the updated configuration.
77+
Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway>`_ guide with the updated configuration.
78+
79+
Deploy vLLM Semantic Router using Helm:
7880

7981
.. code-block:: bash
8082
81-
# Deploy vLLM Semantic Router manifests
82-
kubectl apply -k deploy/kubernetes/ai-gateway/semantic-router
83+
# Deploy vLLM Semantic Router with custom values from GHCR OCI registry
84+
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
85+
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
86+
--version v0.0.0-latest \
87+
--namespace vllm-semantic-router-system \
88+
--create-namespace \
89+
-f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml
90+
8391
kubectl wait --for=condition=Available deployment/semantic-router \
8492
-n vllm-semantic-router-system --timeout=600s
8593
8694
# Install Envoy Gateway
87-
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
88-
--version v0.0.0-latest \
89-
--namespace envoy-gateway-system \
90-
--create-namespace \
91-
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
95+
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
96+
--version v0.0.0-latest \
97+
--namespace envoy-gateway-system \
98+
--create-namespace \
99+
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
92100
93101
# Install Envoy AI Gateway
94102
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
@@ -97,20 +105,27 @@ Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/doc
97105
--create-namespace
98106
99107
# Install Envoy AI Gateway CRDs
100-
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm --version v0.0.0-latest --namespace envoy-ai-gateway-system
108+
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \
109+
--version v0.0.0-latest \
110+
--namespace envoy-ai-gateway-system
101111
102112
# Wait for AI Gateway to be ready
103113
kubectl wait --timeout=300s -n envoy-ai-gateway-system \
104114
deployment/ai-gateway-controller --for=condition=Available
105115
116+
.. note::
117+
118+
The values file contains the configuration for the semantic router including domain classification, LoRA routing, and plugin settings. You can download and customize it from the `semantic-router-values <https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml>`_ to match your vLLM Production Stack setup.
119+
106120
Create LLM Demo Backends and AI Gateway Routes:
107121

108122
.. code-block:: bash
109123
110124
# Apply LLM demo backends
111-
kubectl apply -f deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
125+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
126+
112127
# Apply AI Gateway routes
113-
kubectl apply -f deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
128+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
114129
115130
Step 3: Test the Deployment
116131
----------------------------

tutorials/24-semantic-router-integration.md

Lines changed: 68 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -100,65 +100,53 @@ kubectl get svc vllm-router-service
100100

101101
## Step 2: Deploy vLLM Semantic Router and point it at your vLLM router Service
102102

103-
Follow the official guide from the official website with **the updated config file as the following**: [Install in Kubernetes](https://vllm-semantic-router.com/docs/installation/kubernetes).
103+
Follow the official guide from the official website with **the updated config file as the following**: [Install in Kubernetes](https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway).
104104

105-
Remember to update the semantic router config to include your vLLM router service as an endpoint. Edit `deploy/kubernetes/config.yaml` and set `vllm_endpoints` like this (replace the IP/port with your router Service ClusterIP/port from step 1):
106-
107-
```yaml
108-
vllm_endpoints:
109-
- name: "endpoint1"
110-
address: <YOUR ROUTER SERVICE CLUSTERIP>
111-
port: <YOUR ROUTER SERVICE PORT>
112-
weight: 1
113-
```
114-
115-
Minimal sequence (same as the guide):
105+
Deploy using Helm with custom values:
116106

117107
```bash
118-
# Deploy vLLM Semantic Router manifests
119-
kubectl apply -k deploy/kubernetes/
120-
kubectl wait --for=condition=Available deployment/semantic-router \
121-
-n vllm-semantic-router-system --timeout=600s
122-
123-
# Install Envoy Gateway
124-
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
125-
--version v0.0.0-latest \
126-
--namespace envoy-gateway-system \
127-
--create-namespace
128-
kubectl wait --timeout=300s -n envoy-gateway-system \
129-
deployment/envoy-gateway --for=condition=Available
130-
131-
# Install Envoy AI Gateway
132-
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
133-
--version v0.0.0-latest \
134-
--namespace envoy-ai-gateway-system \
135-
--create-namespace
136-
kubectl wait --timeout=300s -n envoy-ai-gateway-system \
137-
deployment/ai-gateway-controller --for=condition=Available
138-
139-
# Install Gateway API Inference Extension CRDs
140-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/manifests.yaml
141-
kubectl get crd | grep inference
108+
# Deploy vLLM Semantic Router with custom values from GHCR OCI registry
109+
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
110+
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
111+
--version v0.0.0-latest \
112+
--namespace vllm-semantic-router-system \
113+
--create-namespace \
114+
-f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml
115+
116+
kubectl wait --for=condition=Available deployment/semantic-router \
117+
-n vllm-semantic-router-system --timeout=600s
118+
119+
# Install Envoy Gateway
120+
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
121+
--version v0.0.0-latest \
122+
--namespace envoy-gateway-system \
123+
--create-namespace \
124+
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
125+
126+
# Install Envoy AI Gateway
127+
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
128+
--version v0.0.0-latest \
129+
--namespace envoy-ai-gateway-system \
130+
--create-namespace
131+
132+
# Install Envoy AI Gateway CRDs
133+
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \
134+
--version v0.0.0-latest \
135+
--namespace envoy-ai-gateway-system
136+
137+
kubectl wait --timeout=300s -n envoy-ai-gateway-system \
138+
deployment/ai-gateway-controller --for=condition=Available
142139
```
143140

144-
Apply AI Gateway configuration and create the inference pool per the guide:
145-
146-
```bash
147-
# Apply AI Gateway configuration
148-
kubectl apply -f deploy/kubernetes/ai-gateway/configuration
149-
150-
# Restart controllers to pick up new config
151-
kubectl rollout restart -n envoy-gateway-system deployment/envoy-gateway
152-
kubectl rollout restart -n envoy-ai-gateway-system deployment/ai-gateway-controller
153-
kubectl wait --timeout=120s -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
154-
kubectl wait --timeout=120s -n envoy-ai-gateway-system deployment/ai-gateway-controller --for=condition=Available
141+
**Note**: The values file contains the configuration for the semantic router. You can download and customize it from [values.yaml](https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml) to match your vLLM Production Stack setup.
155142

156-
# Create inference pool
157-
kubectl apply -f deploy/kubernetes/ai-gateway/inference-pool
158-
sleep 30
143+
Create LLM Demo Backends and AI Gateway Routes:
159144

160-
# Verify inference pool
161-
kubectl get inferencepool vllm-semantic-router -n vllm-semantic-router-system -o yaml
145+
```bash
146+
# Apply LLM demo backends
147+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
148+
# Apply AI Gateway routes
149+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
162150
```
163151

164152
---
@@ -168,10 +156,10 @@ kubectl get inferencepool vllm-semantic-router -n vllm-semantic-router-system -o
168156
Port-forward to the Envoy service and send a test request, following the guide:
169157

170158
```bash
171-
export GATEWAY_IP="localhost:8080"
172159
export ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system \
173-
--selector=gateway.envoyproxy.io/owning-gateway-namespace=vllm-semantic-router-system,gateway.envoyproxy.io/owning-gateway-name=vllm-semantic-router \
160+
--selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=semantic-router \
174161
-o jsonpath='{.items[0].metadata.name}')
162+
175163
kubectl port-forward -n envoy-gateway-system svc/$ENVOY_SERVICE 8080:80
176164
```
177165

@@ -190,9 +178,33 @@ curl -i -X POST http://localhost:8080/v1/chat/completions \
190178

191179
---
192180

181+
## Cleanup
182+
183+
To remove the entire deployment:
184+
185+
```bash
186+
# Remove Gateway API resources and Demo LLM
187+
kubectl delete -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
188+
kubectl delete -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
189+
190+
# Remove semantic router
191+
helm uninstall semantic-router -n vllm-semantic-router-system
192+
193+
# Remove AI gateway
194+
helm uninstall aieg -n envoy-ai-gateway-system
195+
helm uninstall aieg-crd -n envoy-ai-gateway-system
196+
197+
# Remove Envoy gateway
198+
helm uninstall eg -n envoy-gateway-system
199+
200+
# Remove vLLM Production Stack
201+
helm uninstall vllm-stack
202+
```
203+
204+
---
205+
193206
## Troubleshooting
194207

195208
- If the gateway is not accessible, check the Gateway and Envoy service per the guide.
196-
- If the inference pool is not ready, `kubectl describe` the `InferencePool` and check controller logs.
197-
- If the semantic router is not responding, check its pod status and logs.
209+
- If the semantic router is not responding, check its pod status and logs with `kubectl logs -n vllm-semantic-router-system`.
198210
- If it is returning error code, check the production stack router log.

0 commit comments

Comments
 (0)