Skip to content

Commit d54f51d

Browse files
committed
addressing coderabbit comments
1 parent 0a7915c commit d54f51d

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

modules/configuring-metric-based-autoscaling.adoc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@
44
= Configuring metric-based autoscaling
55

66
[role="_abstract"]
7-
While knative-based autoscaling features are not available in standard deployment modes, you can enable metrics-based autoscaling for an inference service in these deployments. This capability helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
7+
While Knative-based autoscaling features are not available in standard deployment modes, you can enable metrics-based autoscaling for an inference service in these deployments. This capability helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
88

9-
To setup autoscaling for your inference service in standard deployments, you must install and configure the Openshift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
9+
To setup autoscaling for your inference service in standard deployments, you must install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
1010

1111
.Prerequisites
1212
* You have cluster administrator privileges for your {openshift-platform} cluster.
13-
* You have installed the CMA operator on your cluster. For more informatipn, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/nodes/automatically-scaling-pods-with-the-custom-metrics-autoscaler-operator#nodes-cma-autoscaling-custom-install[Installing the custom metrics autoscaler].
13+
* You have installed the CMA operator on your cluster. For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/nodes/automatically-scaling-pods-with-the-custom-metrics-autoscaler-operator#nodes-cma-autoscaling-custom-install[Installing the custom metrics autoscaler].
1414
+
1515
[NOTE]
1616
====
@@ -28,7 +28,7 @@ The `odh-controller` automatically creates the `TriggerAuthentication`, `Service
2828
. Click the `InferenceService` for your deployed model and then click *YAML*.
2929
. Under `spec.predictor`, define a metric-based autoscaling policy similar to the following example:
3030
+
31-
[source]
31+
[source,yaml]
3232
----
3333
spec:
3434
predictor:
@@ -42,16 +42,16 @@ spec:
4242
metric:
4343
backend: "prometheus"
4444
serverAddress: "http://<thanos-service>.<monitoring-namespace>.svc.cluster.local:9092"
45-
query: vllm:num_requests_waiting
45+
query: vllm:num_requests_waiting
4646
authenticationRef:
4747
name: openshift-monitoring-metrics-auth
4848
target:
4949
type: Value
50-
value: "2"
50+
value: 2
5151
----
5252
+
5353
The example configures the inference service to autoscale between 1-5 replicas based on the number of requests waiting to be processed, as determined by the `vllm:num_requests_waiting` metric.
54-
. Click *Save*
54+
. Click *Save*.
5555

5656
//[role="_additional-resources"]
5757
//.Additional resources

0 commit comments

Comments
 (0)