Skip to content

Commit a1a8be8

Browse files
committed
[DOCS] Adds default inference endpoints information (elastic#118463)
* Adds default inference andpoints information * Update docs/reference/inference/inference-apis.asciidoc Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: Liam Thompson <[email protected]> (cherry picked from commit b299837) # Conflicts: # docs/reference/inference/inference-apis.asciidoc
1 parent 3680bd9 commit a1a8be8

File tree

1 file changed

+25
-12
lines changed

1 file changed

+25
-12
lines changed

docs/reference/inference/inference-apis.asciidoc

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,21 +41,34 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
4141
Now use <<semantic-search-semantic-text, semantic text>> to perform
4242
<<semantic-search, semantic search>> on your data.
4343

44-
//[discrete]
45-
//[[default-enpoints]]
46-
//=== Default {infer} endpoints
44+
[discrete]
45+
[[adaptive-allocations]]
46+
=== Adaptive allocations
47+
48+
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
49+
50+
When adaptive allocations are enabled:
51+
52+
* The number of allocations scales up automatically when the load increases.
53+
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
54+
55+
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
56+
57+
[discrete]
58+
[[default-enpoints]]
59+
=== Default {infer} endpoints
4760

48-
//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
49-
//The following list contains the default {infer} endpoints listed by `inference_id`:
61+
Your {es} deployment contains preconfigured {infer} endpoints which makes them easier to use when defining `semantic_text` fields or using {infer} processors.
62+
The following list contains the default {infer} endpoints listed by `inference_id`:
5063

51-
//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
52-
//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
64+
* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
65+
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
5366

54-
//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
55-
//The API call will automatically download and deploy the model which might take a couple of minutes.
56-
//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
57-
//For these models, the minimum number of allocations is `0`.
58-
//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
67+
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
68+
The API call will automatically download and deploy the model which might take a couple of minutes.
69+
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
70+
For these models, the minimum number of allocations is `0`.
71+
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
5972

6073

6174
[discrete]

0 commit comments

Comments
 (0)