Skip to content

Commit d9c5c27

Browse files
authored
docs: update to use kebab-cased flags changed at #1177 (#1193)
1 parent 9ac2a4d commit d9c5c27

File tree

5 files changed

+38
-38
lines changed

5 files changed

+38
-38
lines changed

pkg/epp/server/runserver.go

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -61,24 +61,24 @@ type ExtProcServerRunner struct {
6161

6262
// Default values for CLI flags in main
6363
const (
64-
DefaultGrpcPort = 9002 // default for --grpcPort
65-
DefaultGrpcHealthPort = 9003 // default for --grpcHealthPort
66-
DefaultMetricsPort = 9090 // default for --metricsPort
64+
DefaultGrpcPort = 9002 // default for --grpc-port
65+
DefaultGrpcHealthPort = 9003 // default for --grpc-health-port
66+
DefaultMetricsPort = 9090 // default for --metrics-port
6767
DefaultDestinationEndpointHintMetadataNamespace = "envoy.lb" // default for --destinationEndpointHintMetadataNamespace
68-
DefaultDestinationEndpointHintKey = "x-gateway-destination-endpoint" // default for --destinationEndpointHintKey
68+
DefaultDestinationEndpointHintKey = "x-gateway-destination-endpoint" // default for --destination-endpoint-hint-key
6969
DefaultPoolName = "" // required but no default
70-
DefaultPoolNamespace = "default" // default for --poolNamespace
71-
DefaultRefreshMetricsInterval = 50 * time.Millisecond // default for --refreshMetricsInterval
72-
DefaultRefreshPrometheusMetricsInterval = 5 * time.Second // default for --refreshPrometheusMetricsInterval
73-
DefaultSecureServing = true // default for --secureServing
74-
DefaultHealthChecking = false // default for --healthChecking
75-
DefaultEnablePprof = true // default for --enablePprof
76-
DefaultTotalQueuedRequestsMetric = "vllm:num_requests_waiting" // default for --totalQueuedRequestsMetric
77-
DefaultKvCacheUsagePercentageMetric = "vllm:gpu_cache_usage_perc" // default for --kvCacheUsagePercentageMetric
78-
DefaultLoraInfoMetric = "vllm:lora_requests_info" // default for --loraInfoMetric
79-
DefaultCertPath = "" // default for --certPath
80-
DefaultConfigFile = "" // default for --configFile
81-
DefaultConfigText = "" // default for --configText
70+
DefaultPoolNamespace = "default" // default for --pool-namespace
71+
DefaultRefreshMetricsInterval = 50 * time.Millisecond // default for --refresh-metrics-interval
72+
DefaultRefreshPrometheusMetricsInterval = 5 * time.Second // default for --refresh-prometheus-metrics-interval
73+
DefaultSecureServing = true // default for --secure-serving
74+
DefaultHealthChecking = false // default for --health-checking
75+
DefaultEnablePprof = true // default for --enable-pprof
76+
DefaultTotalQueuedRequestsMetric = "vllm:num_requests_waiting" // default for --total-queued-requests-metric
77+
DefaultKvCacheUsagePercentageMetric = "vllm:gpu_cache_usage_perc" // default for --kv-cache-usage-percentage-metric
78+
DefaultLoraInfoMetric = "vllm:lora_requests_info" // default for --lora-info-metric
79+
DefaultCertPath = "" // default for --cert-path
80+
DefaultConfigFile = "" // default for --config-file
81+
DefaultConfigText = "" // default for --config-text
8282
)
8383

8484
// NewDefaultExtProcServerRunner creates a runner with default values.

site-src/guides/epp-configuration/config-text.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Configuring Plugins via text
22

3-
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
3+
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
44
it is configured. The IGW can be configured in several ways, either by code or via text.
55

66
If configured by code either a set of predetermined environment variables must be used or one must
@@ -95,7 +95,7 @@ schedulingProfiles:
9595
weight: 50
9696
```
9797
98-
If the configuration is in a file, the EPP command line argument `--configFile`
98+
If the configuration is in a file, the EPP command line argument `--config-file`
9999
should be used to specify the full path of the file in question. For example:
100100

101101
```yaml
@@ -115,14 +115,14 @@ spec:
115115
image: ghcr.io/llm-d/llm-d-inference-scheduler:latest
116116
imagePullPolicy: IfNotPresent
117117
args:
118-
- -poolName
118+
- --pool-name
119119
- "${POOL_NAME}"
120120
...
121-
- --configFile
121+
- --config-file
122122
- "/etc/epp/epp-config.yaml"
123123
```
124124

125-
If the configuration is passed as in-line text the EPP command line argument `--configText`
125+
If the configuration is passed as in-line text the EPP command line argument `--config-text`
126126
should be used. For example:
127127

128128
```yaml
@@ -142,10 +142,10 @@ spec:
142142
image: ghcr.io/llm-d/llm-d-inference-scheduler:latest
143143
imagePullPolicy: IfNotPresent
144144
args:
145-
- -poolName
145+
- --pool-name
146146
- "${POOL_NAME}"
147147
...
148-
- --configText
148+
- --config-text
149149
- |
150150
apiVersion: inference.networking.x-k8s.io/v1alpha1
151151
kind: EndpointPickerConfig
@@ -194,7 +194,7 @@ number of pods, and finds the pods that fall into the first range.
194194

195195
#### **LoraAffinityFilter**
196196

197-
Implements a pod selection strategy that when the use of a LoRA adapter is requested, prioritizes pods
197+
Implements a pod selection strategy that when the use of a LoRA adapter is requested, prioritizes pods
198198
that are believed to have the specific LoRA adapter loaded. It also allows for load balancing through
199199
some randomization.
200200

@@ -252,4 +252,4 @@ waiting queue size the pod has, the higher the score it will get (since it's mor
252252
available to serve new request).
253253

254254
- *Type*: queue-scorer
255-
- *Parameters*: none
255+
- *Parameters*: none

site-src/guides/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
164164
./istioctl install --set tag=$TAG --set hub=gcr.io/istio-testing
165165
```
166166

167-
3. If you run the Endpoint Picker (EPP) with the `--secureServing` flag set to `true` (the default mode), it is currently using a self-signed certificate. As a security measure, Istio does not trust self-signed certificates by default. As a temporary workaround, you can apply the destination rule to bypass TLS verification for EPP. A more secure TLS implementation in EPP is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
167+
3. If you run the Endpoint Picker (EPP) with the `--secure-serving` flag set to `true` (the default mode), it is currently using a self-signed certificate. As a security measure, Istio does not trust self-signed certificates by default. As a temporary workaround, you can apply the destination rule to bypass TLS verification for EPP. A more secure TLS implementation in EPP is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
168168

169169
```bash
170170
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/destination-rule.yaml

site-src/guides/inferencepool-rollout.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -253,19 +253,19 @@ spec:
253253
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:main
254254
imagePullPolicy: Always
255255
args:
256-
- -poolName
256+
- --pool-name
257257
- "vllm-llama3-8b-instruct-new"
258-
- -poolNamespace
258+
- --pool-namespace
259259
- "default"
260-
- -v
260+
- --v
261261
- "4"
262262
- --zap-encoder
263263
- "json"
264-
- -grpcPort
264+
- --grpc-port
265265
- "9002"
266-
- -grpcHealthPort
266+
- --grpc-health-port
267267
- "9003"
268-
- -configFile
268+
- --config-file
269269
- "/config/default-plugins.yaml"
270270
ports:
271271
- containerPort: 9002
@@ -468,4 +468,4 @@ kubectl delete Deployment vllm-llama3-8b-instruct-epp --ignore-not-found
468468
kubectl delete Service vllm-llama3-8b-instruct-epp --ignore-not-found
469469
```
470470

471-
With this, all requests should be served by the new Inference Pool.
471+
With this, all requests should be served by the new Inference Pool.

site-src/implementations/model-servers.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,12 @@ Use `--set inferencePool.modelServerType=triton-tensorrt-llm` to install the [`i
2727
### Option 2: Edit EPP deployment yaml
2828

2929
Add the following to the `args` of the [EPP deployment](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/42eb5ff1c5af1275df43ac384df0ddf20da95134/config/manifests/inferencepool-resources.yaml#L32)
30-
30+
3131
```
32-
- -totalQueuedRequestsMetric
32+
- --total-queued-requests-metric
3333
- "nv_trt_llm_request_metrics{request_type=waiting}"
34-
- -kvCacheUsagePercentageMetric
34+
- --kv-cache-usage-percentage-metric
3535
- "nv_trt_llm_kv_cache_block_metrics{kv_cache_block_type=fraction}"
36-
- -loraInfoMetric
36+
- --lora-info-metric
3737
- "" # Set an empty metric to disable LoRA metric scraping as they are not supported by Triton yet.
38-
```
38+
```

0 commit comments

Comments
 (0)