Add the option to specify epp env vars in helm chart (#924)

liu-cong · web-flow · commit 89d1a9d59a18 · 2025-06-04T14:20:37.000-07:00
diff --git a/config/charts/inferencepool/README.md b/config/charts/inferencepool/README.md
@@ -22,6 +22,33 @@ $ helm install vllm-llama3-8b-instruct \
 
 Note that the provider name is needed to deploy provider-specific resources. If no provider is specified, then only the InferencePool object and the EPP are deployed.
 
+### Install with Custom Environment Variables
+
+To set custom environment variables for the EndpointPicker deployment:
+
+```txt
+$ helm install vllm-llama3-8b-instruct \
+  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
+  --set provider.name=[none|gke] \
+  --set inferenceExtension.env.FEATURE_FLAG_ENABLED=true \
+  oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
+```
+
+Alternatively, you can define environment variables in a values file:
+
+```yaml
+# values.yaml
+inferenceExtension:
+  env:
+    FEATURE_FLAG_ENABLED: "true"
+```
+
+And apply it with:
+
+```txt
+$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
+```
+
 ### Install for Triton TensorRT-LLM
 
 Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install for Triton TensorRT-LLM, e.g.,
@@ -57,6 +84,7 @@ The following table list the configurable parameters of the chart.
 | `inferenceExtension.image.tag`              | Image tag of the endpoint picker.                                                                                      |
 | `inferenceExtension.image.pullPolicy`       | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`.      |
 | `inferenceExtension.extProcPort`            | Port where the endpoint picker service is served for external processing. Defaults to `9002`.                          |
+| `inferenceExtension.env`                    | Map of environment variables to set in the endpoint picker container. Defaults to `{}`.                                |
 | `provider.name`                             | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`.                   |
 
 ## Notes
diff --git a/config/charts/inferencepool/templates/epp-deployment.yaml b/config/charts/inferencepool/templates/epp-deployment.yaml
@@ -62,3 +62,8 @@ spec:
             service: inference-extension
           initialDelaySeconds: 5
           periodSeconds: 10
+        env:
+        {{- range $key, $value := .Values.inferenceExtension.env }}
+        - name: {{ $key }}
+          value: {{ $value | quote }}
+        {{- end }}
diff --git a/config/charts/inferencepool/values.yaml b/config/charts/inferencepool/values.yaml
@@ -6,6 +6,10 @@ inferenceExtension:
     tag: main
     pullPolicy: Always
   extProcPort: 9002
+  env: {}
+  # Example environment variables:
+  # env:
+  #   KV_CACHE_SCORE_WEIGHT: "1"
 
 inferencePool:
   targetPortNumber: 8000