You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While Knative-based autoscaling features are not available in standard deployment modes, you can enable metrics-based autoscaling for an inference service in these deployments. This capability helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
8
8
9
-
To setup autoscaling for your inference service in standard deployments, you must install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
9
+
To set up autoscaling for your inference service in standard deployments, you must install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then utilize various model runtime metrics available in OpenShift Monitoring, such as KVCache utilization, Time to First Token (TTFT), and concurrency, to trigger autoscaling of your inference service.
10
10
11
11
.Prerequisites
12
12
* You have cluster administrator privileges for your {openshift-platform} cluster.
@@ -21,7 +21,7 @@ The `odh-controller` automatically creates the `TriggerAuthentication`, `Service
21
21
22
22
.Procedure
23
23
24
-
. Log in to the OpenShift console as a cluster administrator.
24
+
. Log in to the {openshift-platform} console as a cluster administrator.
25
25
. In the *Administrator* perspective, click *Home* -> *Search*.
26
26
. Select the project where you have deployed your model.
27
27
. From the *Resources* dropdown menu, select *InferenceService*.
0 commit comments