diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
index 037d7abeb2a36..c7b779a994a05 100644
--- a/docs/reference/inference/inference-apis.asciidoc
+++ b/docs/reference/inference/inference-apis.asciidoc
@@ -35,6 +35,19 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
 Now use <<semantic-search-semantic-text, semantic text>> to perform
 <<semantic-search, semantic search>> on your data.
 
+[discrete]
+[[adaptive-allocations]]
+=== Adaptive allocations
+
+Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
+
+When adaptive allocations are enabled:
+
+* The number of allocations scales up automatically when the load increases.
+- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
+
+For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
+
 //[discrete]
 //[[default-enpoints]]
 //=== Default {infer} endpoints
diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc
index e7e25ec98b49d..ed93c290b6ad4 100644
--- a/docs/reference/inference/put-inference.asciidoc
+++ b/docs/reference/inference/put-inference.asciidoc
@@ -67,4 +67,17 @@ Click the links to review the configuration details of the services:
 * <<infer-service-watsonx-ai>> (`text_embedding`)
 
 The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of
-the services connect to external providers.
\ No newline at end of file
+the services connect to external providers.
+
+[discrete]
+[[adaptive-allocations-put-inference]]
+==== Adaptive allocations
+
+Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
+
+When adaptive allocations are enabled:
+
+- The number of allocations scales up automatically when the load increases.
+- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
+
+For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
\ No newline at end of file