You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration, you can enable leader election. When enabled, the EPP deployment will have multiple replicas, but only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
84
+
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration set replicas to be greater than one. In such a setup, only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
85
85
86
-
To enable HA, set `inferenceExtension.enableLeaderElection` to `true`.
86
+
To enable HA, set `inferenceExtension.replicas` to a number greater than 1.
@@ -99,7 +99,7 @@ To enable HA, set `inferenceExtension.enableLeaderElection` to `true`.
99
99
100
100
```yaml
101
101
inferenceExtension:
102
-
enableLeaderElection: true
102
+
replicas: 3
103
103
```
104
104
105
105
Then apply it with:
@@ -152,7 +152,7 @@ The following table list the configurable parameters of the chart.
152
152
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
153
153
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
154
154
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
155
-
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. Defaults to `1`. |
155
+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
156
156
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
157
157
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
158
158
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
0 commit comments