Skip to content

Commit a0bb0f9

Browse files
committed
docs: Add leader election section in helm chart README
1 parent a166927 commit a0bb0f9

File tree

3 files changed

+26
-4
lines changed

3 files changed

+26
-4
lines changed

config/charts/inferencepool/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,24 @@ $ helm install triton-llama3-8b-instruct \
7979
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
8080
```
8181

82+
### Install with High Availability (HA)
83+
84+
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration, you can enable leader election. When enabled, the EPP deployment will have multiple replicas, but only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
85+
86+
To enable HA, set `inferenceExtension.enableLeaderElection` to `true` and increase the number of replicas in your `values.yaml` file:
87+
88+
```yaml
89+
inferenceExtension:
90+
replicas: 3
91+
enableLeaderElection: true
92+
```
93+
94+
Then apply it with:
95+
96+
```txt
97+
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml \
98+
```
99+
82100
## Uninstall
83101

84102
Run the following command to uninstall the chart:
@@ -107,6 +125,8 @@ The following table list the configurable parameters of the chart.
107125
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
108126
| `inferenceExtension.logVerbosity` | Logging verbosity level for the endpoint picker. Defaults to `"3"`. |
109127
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
128+
| `inferenceExtension.enableLeaderElection` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. It is recommended to set `inferenceExtension.replicas` to a value greater than 1 when this is set to `true`. Defaults to `false`. |
129+
110130

111131
## Notes
112132

config/charts/inferencepool/values.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ inferenceExtension:
3434
extraContainerPorts: []
3535
# Define additional service ports
3636
extraServicePorts: []
37+
# Enable leader election for high availability. When enabled, it is recommended to set replicas > 1.
38+
# Only the leader pod will be ready to serve traffic.
3739
enableLeaderElection: false
3840

3941
inferencePool:

test/e2e/epp/e2e_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ var _ = ginkgo.Describe("InferencePool", func() {
7272

7373
ginkgo.When("The Inference Extension is running", func() {
7474
ginkgo.It("Should route traffic to target model servers", func() {
75-
verifyTrafficRouting(infObjective)
75+
verifyTrafficRouting()
7676
})
7777

7878
ginkgo.It("Should expose EPP metrics after generating traffic", func() {
@@ -113,7 +113,7 @@ var _ = ginkgo.Describe("InferencePool", func() {
113113
}
114114

115115
ginkgo.By("STEP 1: Verifying initial leader is working correctly before failover")
116-
verifyTrafficRouting(infObjective)
116+
verifyTrafficRouting()
117117
verifyMetrics()
118118

119119
ginkgo.By("STEP 2: Finding and deleting the current leader pod")
@@ -156,7 +156,7 @@ var _ = ginkgo.Describe("InferencePool", func() {
156156
ginkgo.By("Found new leader pod: " + newLeaderPod.Name)
157157

158158
ginkgo.By("STEP 5: Verifying the new leader is working correctly after failover")
159-
verifyTrafficRouting(infObjective)
159+
verifyTrafficRouting()
160160
verifyMetrics()
161161
})
162162
})
@@ -171,7 +171,7 @@ func newInferenceObjective(ns string) *v1alpha2.InferenceObjective {
171171
}
172172

173173
// verifyTrafficRouting contains the logic for the "Should route traffic to target model servers" test.
174-
func verifyTrafficRouting(infObjective *v1alpha2.InferenceObjective) {
174+
func verifyTrafficRouting() {
175175
ginkgo.By("Verifying traffic routing")
176176
for _, t := range []struct {
177177
api string

0 commit comments

Comments
 (0)