Skip to content

Commit 7a2ffef

Browse files
authored
Merge branch 'main' into fix-score-payload
2 parents e12eabd + 228e567 commit 7a2ffef

File tree

25 files changed

+1256
-181
lines changed

25 files changed

+1256
-181
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
We host **bi-weekly** community meetings at the following timeslot:
1515

16-
- Every other Tuesdays at 5:30 PM PT – [Add to Calendar](https://drive.usercontent.google.com/u/0/uc?id=1I3WuivUVAq1vZ2XSW4rmqgD5c0bQcxE0&export=download)
16+
- Every other Tuesdays at 5:30 PM PT – [Add to Calendar](https://drive.google.com/uc?export=download&id=1D4SqQiqzdSx_xsEwS0QTd592zd3Xourh)
1717

1818
All are welcome to join!
1919

community/community-event.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ Info can be found in the [Google Doc](https://docs.google.com/document/d/1SCye2q
88

99
Time: Bi-weekly
1010

11-
**Every other Tuesday 5:30 - 6:00 PM PT**[Add to Calendar](https://drive.usercontent.google.com/u/0/uc?id=1I3WuivUVAq1vZ2XSW4rmqgD5c0bQcxE0&export=download)
11+
**Every other Tuesday 5:30 - 6:00 PM PT**[Add to Calendar](https://drive.google.com/uc?export=download&id=1D4SqQiqzdSx_xsEwS0QTd592zd3Xourh)

docs/source/community/meetings.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ Community Events
66

77
We host bi-weekly community meetings at the following timeslot:
88

9-
**Every other Tuesday at 5:30 PM PT** – `Add to Calendar <https://drive.usercontent.google.com/u/0/uc?id=1I3WuivUVAq1vZ2XSW4rmqgD5c0bQcxE0&export=download>`_
9+
**Every other Tuesday at 5:30 PM PT** – `Add to Calendar <https://drive.google.com/uc?export=download&id=1D4SqQiqzdSx_xsEwS0QTd592zd3Xourh>`_
1010

1111
All are welcome to join!

docs/source/use_cases/semantic-router-integration.rst

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,29 @@ Identify the ClusterIP and port of your router Service:
7474
Step 2: Deploy vLLM Semantic Router
7575
------------------------------------
7676

77-
Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/docs/installation/kubernetes>`_ guide with the updated configuration.
77+
Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/docs/installation/k8s/ai-gateway>`_ guide with the updated configuration.
78+
79+
Deploy vLLM Semantic Router using Helm:
7880

7981
.. code-block:: bash
8082
81-
# Deploy vLLM Semantic Router manifests
82-
kubectl apply -k deploy/kubernetes/ai-gateway/semantic-router
83+
# Deploy vLLM Semantic Router with custom values from GHCR OCI registry
84+
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
85+
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
86+
--version v0.0.0-latest \
87+
--namespace vllm-semantic-router-system \
88+
--create-namespace \
89+
-f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml
90+
8391
kubectl wait --for=condition=Available deployment/semantic-router \
8492
-n vllm-semantic-router-system --timeout=600s
8593
8694
# Install Envoy Gateway
87-
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
88-
--version v0.0.0-latest \
89-
--namespace envoy-gateway-system \
90-
--create-namespace \
91-
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
95+
helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
96+
--version v0.0.0-latest \
97+
--namespace envoy-gateway-system \
98+
--create-namespace \
99+
-f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml
92100
93101
# Install Envoy AI Gateway
94102
helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \
@@ -97,20 +105,27 @@ Follow the official `Install in Kubernetes <https://vllm-semantic-router.com/doc
97105
--create-namespace
98106
99107
# Install Envoy AI Gateway CRDs
100-
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm --version v0.0.0-latest --namespace envoy-ai-gateway-system
108+
helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \
109+
--version v0.0.0-latest \
110+
--namespace envoy-ai-gateway-system
101111
102112
# Wait for AI Gateway to be ready
103113
kubectl wait --timeout=300s -n envoy-ai-gateway-system \
104114
deployment/ai-gateway-controller --for=condition=Available
105115
116+
.. note::
117+
118+
The values file contains the configuration for the semantic router including domain classification, LoRA routing, and plugin settings. You can download and customize it from the `semantic-router-values <https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml>`_ to match your vLLM Production Stack setup.
119+
106120
Create LLM Demo Backends and AI Gateway Routes:
107121

108122
.. code-block:: bash
109123
110124
# Apply LLM demo backends
111-
kubectl apply -f deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
125+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/base-model.yaml
126+
112127
# Apply AI Gateway routes
113-
kubectl apply -f deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
128+
kubectl apply -f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/ai-gateway/aigw-resources/gwapi-resources.yaml
114129
115130
Step 3: Test the Deployment
116131
----------------------------

helm/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,14 @@ This table documents all available configuration values for the Production Stack
201201
| `routerSpec.readinessProbe.failureThreshold` | integer |`3`| Failure threshold for router's readiness probe |
202202
| `routerSpec.readinessProbe.httpGet.path` | string |`"/health"`| Endpoint that the router's readiness probe will be testing |
203203

204+
#### Router OpenTelemetry Configuration
205+
206+
| Field | Type | Default | Description |
207+
|-------|------|---------|-------------|
208+
| `routerSpec.otel.endpoint` | string | `""` | OTLP endpoint for tracing (e.g., "otel-collector:4317"). Tracing is enabled when this is set. |
209+
| `routerSpec.otel.serviceName` | string | `"vllm-router"` | Service name for OpenTelemetry traces |
210+
| `routerSpec.otel.secure` | boolean | `false` | Use secure (TLS) connection for OTLP exporter |
211+
204212
#### Router Ingress Configuration
205213

206214
| Field | Type | Default | Description |

helm/templates/deployment-router.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,15 @@ spec:
136136
- "--lmcache-controller-port"
137137
- "{{ .Values.routerSpec.lmcacheControllerPort }}"
138138
{{- end }}
139+
{{- if .Values.routerSpec.otel.endpoint }}
140+
- "--otel-endpoint"
141+
- "{{ .Values.routerSpec.otel.endpoint }}"
142+
- "--otel-service-name"
143+
- "{{ .Values.routerSpec.otel.serviceName | default "vllm-router" }}"
144+
{{- if .Values.routerSpec.otel.secure }}
145+
- "--otel-secure"
146+
{{- end }}
147+
{{- end }}
139148
{{- if .Values.routerSpec.resources }}
140149
resources:
141150
{{- if .Values.routerSpec.resources.requests }}

helm/tests/routerOtel_test.yaml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
suite: test router OpenTelemetry configuration
2+
templates:
3+
- deployment-router.yaml
4+
tests:
5+
- it: should not include otel args when endpoint is not set
6+
set:
7+
routerSpec:
8+
enableRouter: true
9+
otel:
10+
endpoint: ""
11+
asserts:
12+
- template: deployment-router.yaml
13+
notContains:
14+
path: spec.template.spec.containers[0].args
15+
content: "--otel-endpoint"
16+
17+
- it: should include otel args when endpoint is set
18+
set:
19+
routerSpec:
20+
enableRouter: true
21+
otel:
22+
endpoint: "otel-collector:4317"
23+
serviceName: "vllm-router"
24+
secure: false
25+
asserts:
26+
- template: deployment-router.yaml
27+
contains:
28+
path: spec.template.spec.containers[0].args
29+
content: "--otel-endpoint"
30+
- template: deployment-router.yaml
31+
contains:
32+
path: spec.template.spec.containers[0].args
33+
content: "otel-collector:4317"
34+
- template: deployment-router.yaml
35+
contains:
36+
path: spec.template.spec.containers[0].args
37+
content: "--otel-service-name"
38+
- template: deployment-router.yaml
39+
contains:
40+
path: spec.template.spec.containers[0].args
41+
content: "vllm-router"
42+
- template: deployment-router.yaml
43+
notContains:
44+
path: spec.template.spec.containers[0].args
45+
content: "--otel-secure"
46+
47+
- it: should use custom service name when specified
48+
set:
49+
routerSpec:
50+
enableRouter: true
51+
otel:
52+
endpoint: "jaeger:4317"
53+
serviceName: "my-custom-router"
54+
secure: false
55+
asserts:
56+
- template: deployment-router.yaml
57+
contains:
58+
path: spec.template.spec.containers[0].args
59+
content: "my-custom-router"
60+
61+
- it: should include otel-secure flag when secure is true
62+
set:
63+
routerSpec:
64+
enableRouter: true
65+
otel:
66+
endpoint: "otel-collector:4317"
67+
serviceName: "vllm-router"
68+
secure: true
69+
asserts:
70+
- template: deployment-router.yaml
71+
contains:
72+
path: spec.template.spec.containers[0].args
73+
content: "--otel-endpoint"
74+
- template: deployment-router.yaml
75+
contains:
76+
path: spec.template.spec.containers[0].args
77+
content: "--otel-secure"

helm/values.schema.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,26 @@
580580
"additionalProperties": {
581581
"type": "string"
582582
}
583+
},
584+
"otel": {
585+
"type": "object",
586+
"description": "OpenTelemetry tracing configuration for the router",
587+
"properties": {
588+
"endpoint": {
589+
"type": "string",
590+
"description": "OTLP endpoint for tracing (e.g., 'otel-collector:4317'). Tracing is enabled when this is set."
591+
},
592+
"serviceName": {
593+
"type": "string",
594+
"description": "Service name for OpenTelemetry traces",
595+
"default": "vllm-router"
596+
},
597+
"secure": {
598+
"type": "boolean",
599+
"description": "Use secure (TLS) connection for OTLP exporter",
600+
"default": false
601+
}
602+
}
583603
}
584604
}
585605
}

helm/values.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,16 @@ routerSpec:
377377
# -- Window size in seconds to calculate the request statistics
378378
requestStatsWindow: 60
379379

380+
# -- OpenTelemetry tracing configuration
381+
# When otelEndpoint is set, tracing is automatically enabled
382+
otel:
383+
# -- OTLP endpoint for tracing (e.g., "localhost:4317" or "otel-collector:4317")
384+
endpoint: ""
385+
# -- Service name for traces (default: "vllm-router")
386+
serviceName: "vllm-router"
387+
# -- Use secure (TLS) connection for OTLP exporter (default: false, i.e., insecure)
388+
secure: false
389+
380390
# -- deployment strategy
381391
strategy: {}
382392

operator/api/v1alpha1/vllmruntime_types.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ type DeploymentConfig struct {
3030
// +kubebuilder:default=1
3131
Replicas int32 `json:"replicas,omitempty"`
3232

33+
// Node selector
34+
NodeSelectorTerms []corev1.NodeSelectorTerm `json:"nodeSelectorTerms,omitempty"`
35+
3336
// Deploy strategy
3437
// +kubebuilder:validation:Enum=RollingUpdate;Recreate
3538
// +kubebuilder:default=RollingUpdate
@@ -122,6 +125,9 @@ type ModelSpec struct {
122125

123126
// Maximum number of sequences
124127
MaxNumSeqs int32 `json:"maxNumSeqs,omitempty"`
128+
129+
// Chat template
130+
ChatTemplate string `json:"chatTemplate,omitempty"`
125131
}
126132

127133
// LMCacheConfig defines the LM Cache configuration

0 commit comments

Comments
 (0)