diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index 037d7abeb2a36..c7b779a994a05 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -35,6 +35,19 @@ Elastic –, then create an {infer} endpoint by the <>. Now use <> to perform <> on your data. +[discrete] +[[adaptive-allocations]] +=== Adaptive allocations + +Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load. + +When adaptive allocations are enabled: + +* The number of allocations scales up automatically when the load increases. +- Allocations scale down to a minimum of 0 when the load decreases, saving resources. + +For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation. + //[discrete] //[[default-enpoints]] //=== Default {infer} endpoints diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc index e7e25ec98b49d..ed93c290b6ad4 100644 --- a/docs/reference/inference/put-inference.asciidoc +++ b/docs/reference/inference/put-inference.asciidoc @@ -67,4 +67,17 @@ Click the links to review the configuration details of the services: * <> (`text_embedding`) The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of -the services connect to external providers. \ No newline at end of file +the services connect to external providers. + +[discrete] +[[adaptive-allocations-put-inference]] +==== Adaptive allocations + +Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load. + +When adaptive allocations are enabled: + +- The number of allocations scales up automatically when the load increases. +- Allocations scale down to a minimum of 0 when the load decreases, saving resources. + +For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation. \ No newline at end of file