Skip to content

Commit 6e6823a

Browse files
authored
use replicas field in helm to decide if EPP should run in HA mode (#1628)
Signed-off-by: Nir Rozenbaum <[email protected]>
1 parent 32970c0 commit 6e6823a

File tree

3 files changed

+10
-14
lines changed

3 files changed

+10
-14
lines changed

config/charts/inferencepool/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -101,16 +101,16 @@ $ helm install triton-llama3-8b-instruct \
101101

102102
### Install with High Availability (HA)
103103

104-
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration, you can enable leader election. When enabled, the EPP deployment will have multiple replicas, but only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
104+
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration set replicas to be greater than one. In such a setup, only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
105105

106-
To enable HA, set `inferenceExtension.enableLeaderElection` to `true`.
106+
To enable HA, set `inferenceExtension.replicas` to a number greater than 1.
107107

108108
* Via `--set` flag:
109109

110110
```txt
111111
helm install vllm-llama3-8b-instruct \
112112
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
113-
--set inferenceExtension.enableLeaderElection=true \
113+
--set inferenceExtension.replicas=3 \
114114
--set provider=[none|gke] \
115115
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
116116
```
@@ -119,7 +119,7 @@ To enable HA, set `inferenceExtension.enableLeaderElection` to `true`.
119119

120120
```yaml
121121
inferenceExtension:
122-
enableLeaderElection: true
122+
replicas: 3
123123
```
124124

125125
Then apply it with:
@@ -172,7 +172,7 @@ The following table list the configurable parameters of the chart.
172172
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173173
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174174
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175-
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. Defaults to `1`. |
175+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176176
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177177
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178178
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |

config/charts/inferencepool/templates/epp-deployment.yaml

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,7 @@ metadata:
66
labels:
77
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
88
spec:
9-
{{- if .Values.inferenceExtension.enableLeaderElection }}
10-
replicas: 3
11-
{{- else }}
12-
replicas: 1
13-
{{- end }}
9+
replicas: {{ .Values.inferenceExtension.replicas | default 1 }}
1410
strategy:
1511
# The current recommended EPP deployment pattern is to have a single active replica. This ensures
1612
# optimal performance of the stateful operations such prefix cache aware scorer.
@@ -53,7 +49,7 @@ spec:
5349
- --lora-info-metric
5450
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
5551
{{- end }}
56-
{{- if .Values.inferenceExtension.enableLeaderElection }}
52+
{{- if gt .Values.inferenceExtension.replicas 1 }}
5753
- --ha-enable-leader-election
5854
{{- end }}
5955
# Pass additional flags via the inferenceExtension.flags field in values.yaml.
@@ -72,7 +68,7 @@ spec:
7268
{{- toYaml .Values.inferenceExtension.extraContainerPorts | nindent 8 }}
7369
{{- end }}
7470
livenessProbe:
75-
{{- if .Values.inferenceExtension.enableLeaderElection }}
71+
{{- if gt .Values.inferenceExtension.replicas 1 }}
7672
grpc:
7773
port: 9003
7874
service: liveness
@@ -84,7 +80,7 @@ spec:
8480
initialDelaySeconds: 5
8581
periodSeconds: 10
8682
readinessProbe:
87-
{{- if .Values.inferenceExtension.enableLeaderElection }}
83+
{{- if gt .Values.inferenceExtension.replicas 1 }}
8884
grpc:
8985
port: 9003
9086
service: readiness

config/charts/inferencepool/templates/leader-election-rbac.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{{- if .Values.inferenceExtension.enableLeaderElection }}
1+
{{- if gt .Values.inferenceExtension.replicas 1 }}
22
---
33
kind: Role
44
apiVersion: rbac.authorization.k8s.io/v1

0 commit comments

Comments
 (0)