You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cmd/epp/runner/runner.go
+31-7Lines changed: 31 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,7 @@ import (
25
25
"net/http"
26
26
"net/http/pprof"
27
27
"os"
28
+
"sync/atomic"
28
29
29
30
"github.com/go-logr/logr"
30
31
"github.com/prometheus/client_golang/prometheus"
@@ -151,6 +152,10 @@ var (
151
152
modelServerMetricsPath=flag.String("model-server-metrics-path", "/metrics", "Path to scrape metrics from pods")
152
153
modelServerMetricsScheme=flag.String("model-server-metrics-scheme", "http", "Scheme to scrape metrics from pods")
153
154
modelServerMetricsHttpsInsecureSkipVerify=flag.Bool("model-server-metrics-https-insecure-skip-verify", true, "When using 'https' scheme for 'model-server-metrics-scheme', configure 'InsecureSkipVerify' (default to true)")
155
+
haEnableLeaderElection=flag.Bool(
156
+
"ha-enable-leader-election",
157
+
false,
158
+
"Enables leader election for high availability. When enabled, readiness probes will only pass on the leader.")
154
159
155
160
setupLog=ctrl.Log.WithName("setup")
156
161
)
@@ -190,8 +195,9 @@ func bindEnvToFlags() {
190
195
"POOL_NAME": "pool-name",
191
196
"POOL_NAMESPACE": "pool-namespace",
192
197
// durations & bools work too; flag.Set expects the *string* form
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration, you can enable leader election. When enabled, the EPP deployment will have multiple replicas, but only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
85
+
86
+
To enable HA, set `inferenceExtension.enableLeaderElection` to `true` and increase the number of replicas in your `values.yaml` file:
@@ -107,6 +125,8 @@ The following table list the configurable parameters of the chart.
107
125
|`inferenceExtension.extraServicePorts`| List of additional service ports to expose. Defaults to `[]`. |
108
126
|`inferenceExtension.logVerbosity`| Logging verbosity level for the endpoint picker. Defaults to `"3"`. |
109
127
|`provider.name`| Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
128
+
|`inferenceExtension.enableLeaderElection`| Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. It is recommended to set `inferenceExtension.replicas` to a value greater than 1 when this is set to `true`. Defaults to `false`. |
0 commit comments