Skip to content

Commit 89d1a9d

Browse files
authored
Add the option to specify epp env vars in helm chart (#924)
1 parent 8e112db commit 89d1a9d

File tree

3 files changed

+37
-0
lines changed

3 files changed

+37
-0
lines changed

config/charts/inferencepool/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,33 @@ $ helm install vllm-llama3-8b-instruct \
2222

2323
Note that the provider name is needed to deploy provider-specific resources. If no provider is specified, then only the InferencePool object and the EPP are deployed.
2424

25+
### Install with Custom Environment Variables
26+
27+
To set custom environment variables for the EndpointPicker deployment:
28+
29+
```txt
30+
$ helm install vllm-llama3-8b-instruct \
31+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
32+
--set provider.name=[none|gke] \
33+
--set inferenceExtension.env.FEATURE_FLAG_ENABLED=true \
34+
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
35+
```
36+
37+
Alternatively, you can define environment variables in a values file:
38+
39+
```yaml
40+
# values.yaml
41+
inferenceExtension:
42+
env:
43+
FEATURE_FLAG_ENABLED: "true"
44+
```
45+
46+
And apply it with:
47+
48+
```txt
49+
$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
50+
```
51+
2552
### Install for Triton TensorRT-LLM
2653

2754
Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install for Triton TensorRT-LLM, e.g.,
@@ -57,6 +84,7 @@ The following table list the configurable parameters of the chart.
5784
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
5885
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
5986
| `inferenceExtension.extProcPort` | Port where the endpoint picker service is served for external processing. Defaults to `9002`. |
87+
| `inferenceExtension.env` | Map of environment variables to set in the endpoint picker container. Defaults to `{}`. |
6088
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
6189

6290
## Notes

config/charts/inferencepool/templates/epp-deployment.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,8 @@ spec:
6262
service: inference-extension
6363
initialDelaySeconds: 5
6464
periodSeconds: 10
65+
env:
66+
{{- range $key, $value := .Values.inferenceExtension.env }}
67+
- name: {{ $key }}
68+
value: {{ $value | quote }}
69+
{{- end }}

config/charts/inferencepool/values.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ inferenceExtension:
66
tag: main
77
pullPolicy: Always
88
extProcPort: 9002
9+
env: {}
10+
# Example environment variables:
11+
# env:
12+
# KV_CACHE_SCORE_WEIGHT: "1"
913

1014
inferencePool:
1115
targetPortNumber: 8000

0 commit comments

Comments
 (0)