Skip to content

Commit c561234

Browse files
authored
- added epp-deployment container ports configurability (with service exposure if needed) (#1211)
- made epp-deployment env configuration flexible - made epp logging verbosity configurable Signed-off-by: Maroon Ayoub <[email protected]>
1 parent 9d810b2 commit c561234

File tree

4 files changed

+48
-17
lines changed

4 files changed

+48
-17
lines changed

config/charts/inferencepool/README.md

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,26 +24,44 @@ Note that the provider name is needed to deploy provider-specific resources. If
2424

2525
### Install with Custom Environment Variables
2626

27-
To set custom environment variables for the EndpointPicker deployment:
27+
To set custom environment variables for the EndpointPicker deployment, you can define them as free-form YAML in the `values.yaml` file:
28+
29+
```yaml
30+
inferenceExtension:
31+
env:
32+
- name: FEATURE_FLAG_ENABLED
33+
value: "true"
34+
- name: CUSTOM_ENV_VAR
35+
value: "custom_value"
36+
- name: POD_IP
37+
valueFrom:
38+
fieldRef:
39+
fieldPath: status.podIP
40+
```
41+
42+
Then apply it with:
2843
2944
```txt
30-
$ helm install vllm-llama3-8b-instruct \
31-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
32-
--set provider.name=[none|gke] \
33-
--set inferenceExtension.env.FEATURE_FLAG_ENABLED=true \
34-
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
45+
$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
3546
```
3647

37-
Alternatively, you can define environment variables in a values file:
48+
### Install with Additional Ports
49+
50+
To expose additional ports (e.g., for ZMQ), you can define them in the `values.yaml` file:
3851

3952
```yaml
40-
# values.yaml
4153
inferenceExtension:
42-
env:
43-
FEATURE_FLAG_ENABLED: "true"
54+
extraContainerPorts:
55+
- name: zmq
56+
containerPort: 5557
57+
protocol: TCP
58+
extraServicePorts: # if need to expose the port for external communication
59+
- name: zmq
60+
port: 5557
61+
protocol: TCP
4462
```
4563
46-
And apply it with:
64+
Then apply it with:
4765
4866
```txt
4967
$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
@@ -84,7 +102,10 @@ The following table list the configurable parameters of the chart.
84102
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
85103
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
86104
| `inferenceExtension.extProcPort` | Port where the endpoint picker service is served for external processing. Defaults to `9002`. |
87-
| `inferenceExtension.env` | Map of environment variables to set in the endpoint picker container. Defaults to `{}`. |
105+
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
106+
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
107+
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
108+
| `inferenceExtension.logVerbosity` | Logging verbosity level for the endpoint picker. Defaults to `"3"`. |
88109
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
89110

90111
## Notes

config/charts/inferencepool/templates/epp-deployment.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ spec:
2828
- --pool-namespace
2929
- {{ .Release.Namespace }}
3030
- --v
31-
- "3"
31+
- "{{ .Values.inferenceExtension.logVerbosity | default "3" }}"
3232
- --grpc-port
3333
- "9002"
3434
- --grpc-health-port
@@ -54,6 +54,9 @@ spec:
5454
containerPort: 9003
5555
- name: metrics
5656
containerPort: 9090
57+
{{- with .Values.inferenceExtension.extraContainerPorts }}
58+
{{- toYaml . | nindent 8 }}
59+
{{- end }}
5760
livenessProbe:
5861
grpc:
5962
port: 9003
@@ -66,10 +69,9 @@ spec:
6669
service: inference-extension
6770
initialDelaySeconds: 5
6871
periodSeconds: 10
72+
{{- with .Values.inferenceExtension.env }}
6973
env:
70-
{{- range $key, $value := .Values.inferenceExtension.env }}
71-
- name: {{ $key }}
72-
value: {{ $value | quote }}
74+
{{- toYaml . | nindent 8 }}
7375
{{- end }}
7476
volumeMounts:
7577
- name: plugins-config-volume

config/charts/inferencepool/templates/epp-service.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,7 @@ spec:
1515
- name: http-metrics
1616
protocol: TCP
1717
port: {{ .Values.inferenceExtension.metricsPort | default 9090 }}
18+
{{- with .Values.inferenceExtension.extraServicePorts }}
19+
{{- toYaml . | nindent 4 }}
20+
{{- end }}
1821
type: ClusterIP

config/charts/inferencepool/values.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ inferenceExtension:
66
tag: main
77
pullPolicy: Always
88
extProcPort: 9002
9-
env: {}
9+
env: []
1010
enablePprof: true # Enable pprof handlers for profiling and debugging
1111
# This is the plugins configuration file.
1212
pluginsConfigFile: "default-plugins.yaml"
@@ -32,6 +32,11 @@ inferenceExtension:
3232
# env:
3333
# KV_CACHE_SCORE_WEIGHT: "1"
3434

35+
# Define additional container ports
36+
extraContainerPorts: []
37+
# Define additional service ports
38+
extraServicePorts: []
39+
3540
inferencePool:
3641
targetPortNumber: 8000
3742
modelServerType: vllm # vllm, triton-tensorrt-llm

0 commit comments

Comments
 (0)