You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
echo"Updating ${EPP}, ${EPP_HELM}, ${BBR_HELM}, and ${CONFORMANCE_MANIFESTS} ..."
81
+
echo"Updating ${EPP_HELM}, ${BBR_HELM}, and ${CONFORMANCE_MANIFESTS} ..."
83
82
84
83
# Update the container tag.
85
-
sed -i.bak -E "s|(us-central1-docker\.pkg\.dev/k8s-staging-images/gateway-api-inference-extension/epp:)[^\"[:space:]]+|\1${RELEASE_TAG}|g""$EPP"
86
84
sed -i.bak -E "s|(tag: )[^\"[:space:]]+|\1${RELEASE_TAG}|g""$EPP_HELM"
87
85
sed -i.bak -E "s|(tag: )[^\"[:space:]]+|\1${RELEASE_TAG}|g""$BBR_HELM"
88
86
sed -i.bak -E "s|(us-central1-docker\.pkg\.dev/k8s-staging-images/gateway-api-inference-extension/epp:)[^\"[:space:]]+|\1${RELEASE_TAG}|g""$CONFORMANCE_MANIFESTS"
89
87
90
88
# Update the container image pull policy.
91
-
sed -i.bak '/us-central1-docker.pkg.dev\/k8s-staging-images\/gateway-api-inference-extension\/epp/{n;s/Always/IfNotPresent/;}'"$EPP"
92
89
sed -i.bak '/us-central1-docker.pkg.dev\/k8s-staging-images\/gateway-api-inference-extension\/epp/{n;s/Always/IfNotPresent/;}'"$CONFORMANCE_MANIFESTS"
93
90
94
91
# Update the container registry.
95
-
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g""$EPP"
96
92
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g""$EPP_HELM"
97
93
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g""$BBR_HELM"
98
94
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g""$CONFORMANCE_MANIFESTS"
@@ -139,8 +135,8 @@ sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io
Copy file name to clipboardExpand all lines: site-src/implementations/model-servers.md
+17-22Lines changed: 17 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,34 +19,29 @@ vLLM is configured as the default in the [endpoint picker extension](https://git
19
19
20
20
Triton specific metric names need to be specified when starting the EPP.
21
21
22
-
### Option 1: Use Helm
22
+
Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the `inferencepool` via helm. See the [`inferencepool` helm guide](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md) for more details.
23
23
24
-
Use `--set inferencePool.modelServerType=triton-tensorrt-llm`to install the [`inferencepool` via helm](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/charts/inferencepool). See the [`inferencepool` helm guide](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/charts/inferencepool/README.md) for more details.
24
+
Add the following to the `flags` in the helm chart as [flags to EPP](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
25
25
26
-
### Option 2: Edit EPP deployment yaml
27
-
28
-
Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/manifests/inferencepool-resources.yaml#L32)
value="" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
37
33
```
38
34
39
35
## SGLang
40
36
41
-
### Edit EPP deploymentyaml
37
+
Add the following `flags` while deploying using helm charts in the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
42
38
43
-
Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/manifests/inferencepool-resources.yaml#L32)
44
39
45
40
```
46
-
- --totalQueuedRequestsMetric
47
-
- "sglang:num_queue_reqs"
48
-
- --kvCacheUsagePercentageMetric
49
-
- "sglang:token_usage"
50
-
- --lora-info-metric
51
-
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by SGLang yet.
52
-
```
41
+
- name=total-queued-requests-metric
42
+
value="sglang:num_queue_reqs"
43
+
- name=kv-cache-usage-percentage-metric
44
+
value="sglang:token_usage"
45
+
- name=lora-info-metric
46
+
value="" # Set an empty metric to disable LoRA metric scraping as they are not supported by SGLang yet.
0 commit comments