Skip to content

Commit 2e069d4

Browse files
authored
Deprecate inferencepool-resources.yaml (#1586)
* Update docs to use helm charts * Update formatting * Remove inferencepool resources yaml
1 parent ba05c43 commit 2e069d4

File tree

3 files changed

+20
-215
lines changed

3 files changed

+20
-215
lines changed

config/manifests/inferencepool-resources.yaml

Lines changed: 0 additions & 186 deletions
This file was deleted.

hack/release-quickstart.sh

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,25 +74,21 @@ sed -i.bak "s|kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-in
7474
# -----------------------------------------------------------------------------
7575
# Update image references
7676
# -----------------------------------------------------------------------------
77-
EPP="config/manifests/inferencepool-resources.yaml"
7877
#TODO: Put all helm values files into an array to loop over
7978
EPP_HELM="config/charts/inferencepool/values.yaml"
8079
BBR_HELM="config/charts/body-based-routing/values.yaml"
8180
CONFORMANCE_MANIFESTS="conformance/resources/base.yaml"
82-
echo "Updating ${EPP}, ${EPP_HELM}, ${BBR_HELM}, and ${CONFORMANCE_MANIFESTS} ..."
81+
echo "Updating ${EPP_HELM}, ${BBR_HELM}, and ${CONFORMANCE_MANIFESTS} ..."
8382

8483
# Update the container tag.
85-
sed -i.bak -E "s|(us-central1-docker\.pkg\.dev/k8s-staging-images/gateway-api-inference-extension/epp:)[^\"[:space:]]+|\1${RELEASE_TAG}|g" "$EPP"
8684
sed -i.bak -E "s|(tag: )[^\"[:space:]]+|\1${RELEASE_TAG}|g" "$EPP_HELM"
8785
sed -i.bak -E "s|(tag: )[^\"[:space:]]+|\1${RELEASE_TAG}|g" "$BBR_HELM"
8886
sed -i.bak -E "s|(us-central1-docker\.pkg\.dev/k8s-staging-images/gateway-api-inference-extension/epp:)[^\"[:space:]]+|\1${RELEASE_TAG}|g" "$CONFORMANCE_MANIFESTS"
8987

9088
# Update the container image pull policy.
91-
sed -i.bak '/us-central1-docker.pkg.dev\/k8s-staging-images\/gateway-api-inference-extension\/epp/{n;s/Always/IfNotPresent/;}' "$EPP"
9289
sed -i.bak '/us-central1-docker.pkg.dev\/k8s-staging-images\/gateway-api-inference-extension\/epp/{n;s/Always/IfNotPresent/;}' "$CONFORMANCE_MANIFESTS"
9390

9491
# Update the container registry.
95-
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g" "$EPP"
9692
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g" "$EPP_HELM"
9793
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g" "$BBR_HELM"
9894
sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io|g" "$CONFORMANCE_MANIFESTS"
@@ -139,8 +135,8 @@ sed -i.bak -E "s|us-central1-docker\.pkg\.dev/k8s-staging-images|registry.k8s.io
139135
# -----------------------------------------------------------------------------
140136
# Stage the changes
141137
# -----------------------------------------------------------------------------
142-
echo "Staging $VERSION_FILE $UPDATED_CRD $README $EPP $EPP_HELM $BBR_HELM $CONFORMANCE_MANIFESTS $VLLM_GPU_DEPLOY $VLLM_CPU_DEPLOY $VLLM_SIM_DEPLOY files..."
143-
git add $VERSION_FILE $UPDATED_CRD $README $EPP $EPP_HELM $BBR_HELM $CONFORMANCE_MANIFESTS $VLLM_GPU_DEPLOY $VLLM_CPU_DEPLOY $VLLM_SIM_DEPLOY
138+
echo "Staging $VERSION_FILE $UPDATED_CRD $README $EPP_HELM $BBR_HELM $CONFORMANCE_MANIFESTS $VLLM_GPU_DEPLOY $VLLM_CPU_DEPLOY $VLLM_SIM_DEPLOY files..."
139+
git add $VERSION_FILE $UPDATED_CRD $README $EPP_HELM $BBR_HELM $CONFORMANCE_MANIFESTS $VLLM_GPU_DEPLOY $VLLM_CPU_DEPLOY $VLLM_SIM_DEPLOY
144140

145141
# -----------------------------------------------------------------------------
146142
# Cleanup backup files and finish

site-src/implementations/model-servers.md

Lines changed: 17 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,34 +19,29 @@ vLLM is configured as the default in the [endpoint picker extension](https://git
1919

2020
Triton specific metric names need to be specified when starting the EPP.
2121

22-
### Option 1: Use Helm
22+
Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the `inferencepool` via helm. See the [`inferencepool` helm guide](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md) for more details.
2323

24-
Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the [`inferencepool` via helm](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/charts/inferencepool). See the [`inferencepool` helm guide](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/charts/inferencepool/README.md) for more details.
24+
Add the following to the `flags` in the helm chart as [flags to EPP](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
2525

26-
### Option 2: Edit EPP deployment yaml
27-
28-
Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/manifests/inferencepool-resources.yaml#L32)
29-
30-
```
31-
- --total-queued-requests-metric
32-
- "nv_trt_llm_request_metrics{request_type=waiting}"
33-
- --kv-cache-usage-percentage-metric
34-
- "nv_trt_llm_kv_cache_block_metrics{kv_cache_block_type=fraction}"
35-
- --lora-info-metric
36-
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
26+
```
27+
- name=total-queued-requests-metric
28+
value="nv_trt_llm_request_metrics{request_type=waiting}"
29+
- name=kv-cache-usage-percentage-metric
30+
value="nv_trt_llm_kv_cache_block_metrics{kv_cache_block_type=fraction}"
31+
- name=lora-info-metric
32+
value="" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
3733
```
3834

3935
## SGLang
4036

41-
### Edit EPP deployment yaml
37+
Add the following `flags` while deploying using helm charts in the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/29ea29028496a638b162ff287c62c0087211bbe5/config/charts/inferencepool/values.yaml#L36)
4238

43-
Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/manifests/inferencepool-resources.yaml#L32)
4439

4540
```
46-
- --totalQueuedRequestsMetric
47-
- "sglang:num_queue_reqs"
48-
- --kvCacheUsagePercentageMetric
49-
- "sglang:token_usage"
50-
- --lora-info-metric
51-
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by SGLang yet.
52-
```
41+
- name=total-queued-requests-metric
42+
value="sglang:num_queue_reqs"
43+
- name=kv-cache-usage-percentage-metric
44+
value="sglang:token_usage"
45+
- name=lora-info-metric
46+
value="" # Set an empty metric to disable LoRA metric scraping as they are not supported by SGLang yet.
47+
```

0 commit comments

Comments
 (0)