Merge pull request #903 from syaseen-rh/RHOAIENG-25115_rev1

syaseen-rh · web-flow · commit 6518322ec8b1 · 2025-08-08T12:31:22.000-04:00
RHOAIENG-25515_rev1: Adding TP note
diff --git a/modules/configuring-metric-based-autoscaling.adoc b/modules/configuring-metric-based-autoscaling.adoc
@@ -4,6 +4,24 @@
 = Configuring metrics-based autoscaling
 
 [role="_abstract"]
+
+ifndef::upstream[]
+[IMPORTANT]
+====
+ifdef::self-managed[]
+Metrics-based autoscaling is currently available in {productname-long} {vernum} as a Technology Preview feature.
+endif::[]
+ifdef::cloud-service[]
+Metrics-based autoscaling is currently available in {productname-long} as a Technology Preview feature.
+endif::[]
+Technology Preview features are not supported with {org-name} production service level agreements (SLAs) and might not be functionally complete.
+{org-name} does not recommend using them in production.
+These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
+
+For more information about the support scope of {org-name} Technology Preview features, see link:https://access.redhat.com/support/offerings/techpreview/[Technology Preview Features Support Scope].
+====
+endif::[]
+
 Knative-based autoscaling is not available in standard deployment mode. However, you can enable metrics-based autoscaling for an inference service in standard deployment mode. Metrics-based autoscaling helps you efficiently manage accelerator resources, lower operational costs, and ensure that your inference services meet performance requirements.
 
 To set up autoscaling for your inference service in standard deployments, install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on Kubernetes Event-driven Autoscaling (KEDA). You can then use various model runtime metrics available in OpenShift Monitoring to trigger autoscaling of your inference service, such as KVCache utilization, Time to First Token (TTFT), and Concurrency.