Skip to content

Commit 5c92877

Browse files
committed
addressing coderabbit comments
1 parent d54f51d commit 5c92877

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

modules/configuring-metric-based-autoscaling.adoc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
:_module-type: PROCEDURE
22

3-
[id="configuring-metric-based-autoscaling_{context}"]
4-
= Configuring metric-based autoscaling
3+
[id="configuring-metrics-based-autoscaling_{context}"]
4+
= Configuring metrics-based autoscaling
55

66
[role="_abstract"]
77
While Knative-based autoscaling features are not available in standard deployment modes, you can enable metrics-based autoscaling for an inference service in these deployments. This capability helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
88

9-
To setup autoscaling for your inference service in standard deployments, you must install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
9+
To set up autoscaling for your inference service in standard deployments, you must install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
1010

1111
.Prerequisites
1212
* You have cluster administrator privileges for your {openshift-platform} cluster.
@@ -21,7 +21,7 @@ The `odh-controller` automatically creates the `TriggerAuthentication`, `Service
2121

2222
.Procedure
2323

24-
. Log in to the OpenShift console as a cluster administrator.
24+
. Log in to the {openshift-platform} console as a cluster administrator.
2525
. In the *Administrator* perspective, click *Home* -> *Search*.
2626
. Select the project where you have deployed your model.
2727
. From the *Resources* dropdown menu, select *InferenceService*.
@@ -35,13 +35,13 @@ spec:
3535
# …
3636
minReplicas: 1
3737
maxReplicas: 5
38-
autoScaling:
38+
autoscaling:
3939
metrics:
4040
- type: External
4141
external:
4242
metric:
4343
backend: "prometheus"
44-
serverAddress: "http://<thanos-service>.<monitoring-namespace>.svc.cluster.local:9092"
44+
serverAddress: "https://<thanos-service>.<monitoring-namespace>.svc.cluster.local:9092"
4545
query: vllm:num_requests_waiting
4646
authenticationRef:
4747
name: openshift-monitoring-metrics-auth
@@ -55,4 +55,3 @@ The example configures the inference service to autoscale between 1-5 replicas b
5555

5656
//[role="_additional-resources"]
5757
//.Additional resources
58-
// link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/monitoring/index[Monitoring]

0 commit comments

Comments
 (0)