[DOCS] Adds default inference endpoints information (elastic#118463)

kosabogi · kosabogi · commit a1a8be81eddc · 2025-01-06T08:31:33.000+01:00
* Adds default inference andpoints information * Update docs/reference/inference/inference-apis.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> (cherry picked from commit b299837) # Conflicts: # docs/reference/inference/inference-apis.asciidoc
diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -41,21 +41,34 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
 Now use <<semantic-search-semantic-text, semantic text>> to perform
 <<semantic-search, semantic search>> on your data.
 
-//[discrete]
-//[[default-enpoints]]
-//=== Default {infer} endpoints
+[discrete]
+[[adaptive-allocations]]
+=== Adaptive allocations
+
+Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
+
+When adaptive allocations are enabled:
+
+* The number of allocations scales up automatically when the load increases.
+- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
+
+For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
+
+[discrete]
+[[default-enpoints]]
+=== Default {infer} endpoints
 
-//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
-//The following list contains the default {infer} endpoints listed by `inference_id`:
+Your {es} deployment contains preconfigured {infer} endpoints which makes them easier to use when defining `semantic_text` fields or using {infer} processors.
+The following list contains the default {infer} endpoints listed by `inference_id`:
 
-//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
-//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
+* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
+* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
 
-//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
-//The API call will automatically download and deploy the model which might take a couple of minutes.
-//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
-//For these models, the minimum number of allocations is `0`. 
-//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
+Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
+The API call will automatically download and deploy the model which might take a couple of minutes.
+Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
+For these models, the minimum number of allocations is `0`. 
+If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
 
 
 [discrete]