Skip to content

Commit 6518322

Browse files
authored
Merge pull request #903 from syaseen-rh/RHOAIENG-25115_rev1
RHOAIENG-25515_rev1: Adding TP note
2 parents 106e1e2 + 94c39e5 commit 6518322

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

modules/configuring-metric-based-autoscaling.adoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,24 @@
44
= Configuring metrics-based autoscaling
55

66
[role="_abstract"]
7+
8+
ifndef::upstream[]
9+
[IMPORTANT]
10+
====
11+
ifdef::self-managed[]
12+
Metrics-based autoscaling is currently available in {productname-long} {vernum} as a Technology Preview feature.
13+
endif::[]
14+
ifdef::cloud-service[]
15+
Metrics-based autoscaling is currently available in {productname-long} as a Technology Preview feature.
16+
endif::[]
17+
Technology Preview features are not supported with {org-name} production service level agreements (SLAs) and might not be functionally complete.
18+
{org-name} does not recommend using them in production.
19+
These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
20+
21+
For more information about the support scope of {org-name} Technology Preview features, see link:https://access.redhat.com/support/offerings/techpreview/[Technology Preview Features Support Scope].
22+
====
23+
endif::[]
24+
725
Knative-based autoscaling is not available in standard deployment mode. However, you can enable metrics-based autoscaling for an inference service in standard deployment mode. Metrics-based autoscaling helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
826

927
To set up autoscaling for your inference service in standard deployments, install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then use various model runtime metrics available in OpenShift Monitoring to trigger autoscaling of your inference service, such as KVCache utilization, Time to First Token (TTFT), and Concurrency.

0 commit comments

Comments
 (0)