Skip to content

Commit d46ec49

Browse files
kosabogileemthompo
andcommitted
[DOCS] Adds adaptive allocations information to Inference APIs (#117546)
* Adds adaptive allocations information to Inference APIs * Update docs/reference/inference/inference-apis.asciidoc Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/inference/put-inference.asciidoc Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/inference/inference-apis.asciidoc Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: Liam Thompson <[email protected]>
1 parent 299e4c7 commit d46ec49

File tree

2 files changed

+27
-1
lines changed

2 files changed

+27
-1
lines changed

docs/reference/inference/inference-apis.asciidoc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,19 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
3535
Now use <<semantic-search-semantic-text, semantic text>> to perform
3636
<<semantic-search, semantic search>> on your data.
3737

38+
[discrete]
39+
[[adaptive-allocations]]
40+
=== Adaptive allocations
41+
42+
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
43+
44+
When adaptive allocations are enabled:
45+
46+
* The number of allocations scales up automatically when the load increases.
47+
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
48+
49+
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
50+
3851
//[discrete]
3952
//[[default-enpoints]]
4053
//=== Default {infer} endpoints

docs/reference/inference/put-inference.asciidoc

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,17 @@ Click the links to review the configuration details of the services:
6767
* <<infer-service-watsonx-ai>> (`text_embedding`)
6868

6969
The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of
70-
the services connect to external providers.
70+
the services connect to external providers.
71+
72+
[discrete]
73+
[[adaptive-allocations-put-inference]]
74+
==== Adaptive allocations
75+
76+
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
77+
78+
When adaptive allocations are enabled:
79+
80+
- The number of allocations scales up automatically when the load increases.
81+
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
82+
83+
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.

0 commit comments

Comments
 (0)