Skip to content

Commit 12a97e1

Browse files
authored
Cleanup helm flags which have default values from values yaml (#1429)
* Cleanup flags with default values in values yaml * Update README.md
1 parent f1141f9 commit 12a97e1

File tree

2 files changed

+5
-48
lines changed

2 files changed

+5
-48
lines changed

config/charts/inferencepool/README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -83,18 +83,18 @@ $ helm install triton-llama3-8b-instruct \
8383

8484
To deploy the EndpointPicker in a high-availability (HA) active-passive configuration, you can enable leader election. When enabled, the EPP deployment will have multiple replicas, but only one "leader" replica will be active and ready to process traffic at any given time. If the leader pod fails, another pod will be elected as the new leader, ensuring service continuity.
8585

86-
To enable HA, set `inferenceExtension.enableLeaderElection` to `true` and increase the number of replicas in your `values.yaml` file:
86+
To enable HA, set `inferenceExtension.flags.has-enable-leader-election` to `true` and increase the number of replicas in your `values.yaml` file:
8787

8888
```yaml
8989
inferenceExtension:
9090
replicas: 3
91-
enableLeaderElection: true
91+
has-enable-leader-election: true
9292
```
9393
9494
Then apply it with:
9595
9696
```txt
97-
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml \
97+
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
9898
```
9999

100100
## Uninstall
@@ -122,10 +122,9 @@ The following table list the configurable parameters of the chart.
122122
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
123123
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
124124
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
125-
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. |
125+
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
126+
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
126127
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
127-
| `inferenceExtension.enableLeaderElection` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. It is recommended to set `inferenceExtension.replicas` to a value greater than 1 when this is set to `true`. Defaults to `false`. |
128-
129128

130129
## Notes
131130

config/charts/inferencepool/values.yaml

Lines changed: 0 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -32,51 +32,9 @@ inferenceExtension:
3232
# ENABLE_EXPERIMENTAL_FEATURE: "true"
3333

3434
flags:
35-
- name: grpc-port
36-
value: 9002
37-
- name: grpc-health-port
38-
value: 9003
39-
- name: metrics-port
40-
value: 9090
41-
- name: enable-pprof
42-
value: "true" # Enable pprof handlers for profiling and debugging
43-
- name: pool-group
44-
value: "inference.networking.k8s.io"
4535
# Log verbosity
4636
- name: v
4737
value: 1
48-
- name: secure-serving
49-
value: "true"
50-
- name: health-checking
51-
value: "false"
52-
- name: cert-path
53-
value: ""
54-
- name: total-queued-requests-metric
55-
value: "vllm:num_requests_waiting"
56-
- name: kv-cache-usage-percentage-metric
57-
value: "vllm:gpu_cache_usage_perc"
58-
- name: lora-info-metric
59-
value: "vllm:lora_requests_info"
60-
- name: refresh-metrics-interval
61-
value: "50ms"
62-
- name: refresh-prometheus-metrics-interval
63-
value: "5s"
64-
- name: metrics-staleness-threshold
65-
value: "2s"
66-
- name: config-file
67-
value: ""
68-
- name: config-text
69-
value: ""
70-
- name: model-server-metrics-port
71-
value: 0
72-
- name: model-server-metrics-path
73-
value: "/metrics"
74-
- name: model-server-metrics-scheme
75-
value: "http"
76-
- name: model-server-metrics-https-insecure-skip-verify
77-
value: "true"
78-
- name: has-enable-leader-election
79-
value: false
8038

8139
inferencePool:
8240
targetPorts:

0 commit comments

Comments
 (0)