Skip to content

Commit 7d10307

Browse files
authored
[DOCS] Adds adaptive_allocations to inference and trained model API docs (#111476) (#111508)
1 parent ade5b13 commit 7d10307

File tree

5 files changed

+225
-23
lines changed

5 files changed

+225
-23
lines changed

docs/reference/inference/service-elasticsearch.asciidoc

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,22 @@ include::inference-shared.asciidoc[tag=service-settings]
5151
These settings are specific to the `elasticsearch` service.
5252
--
5353

54+
`adaptive_allocations`:::
55+
(Optional, object)
56+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
57+
58+
`enabled`::::
59+
(Optional, Boolean)
60+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
61+
62+
`max_number_of_allocations`::::
63+
(Optional, integer)
64+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
65+
66+
`min_number_of_allocations`::::
67+
(Optional, integer)
68+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
69+
5470
`model_id`:::
5571
(Required, string)
5672
The name of the model to use for the {infer} task.
@@ -59,7 +75,9 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal
5975

6076
`num_allocations`:::
6177
(Required, integer)
62-
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
78+
The total number of allocations this model is assigned across machine learning nodes.
79+
Increasing this value generally increases the throughput.
80+
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
6381

6482
`num_threads`:::
6583
(Required, integer)
@@ -137,3 +155,31 @@ PUT _inference/text_embedding/my-msmarco-minilm-model <1>
137155
<1> Provide an unique identifier for the inference endpoint. The `inference_id` must be unique and must not match the `model_id`.
138156
<2> The `model_id` must be the ID of a text embedding model which has already been
139157
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
158+
159+
[discrete]
160+
[[inference-example-adaptive-allocation]]
161+
==== Setting adaptive allocation for E5 via the `elasticsearch` service
162+
163+
The following example shows how to create an {infer} endpoint called
164+
`my-e5-model` to perform a `text_embedding` task type and configure adaptive
165+
allocations.
166+
167+
The API request below will automatically download the E5 model if it isn't
168+
already downloaded and then deploy the model.
169+
170+
[source,console]
171+
------------------------------------------------------------
172+
PUT _inference/text_embedding/my-e5-model
173+
{
174+
"service": "elasticsearch",
175+
"service_settings": {
176+
"adaptive_allocations": {
177+
"enabled": true,
178+
"min_number_of_allocations": 3,
179+
"max_number_of_allocations": 10
180+
},
181+
"model_id": ".multilingual-e5-small"
182+
}
183+
}
184+
------------------------------------------------------------
185+
// TEST[skip:TBD]

docs/reference/inference/service-elser.asciidoc

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,27 @@ include::inference-shared.asciidoc[tag=service-settings]
4848
These settings are specific to the `elser` service.
4949
--
5050

51+
`adaptive_allocations`:::
52+
(Optional, object)
53+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
54+
55+
`enabled`::::
56+
(Optional, Boolean)
57+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
58+
59+
`max_number_of_allocations`::::
60+
(Optional, integer)
61+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
62+
63+
`min_number_of_allocations`::::
64+
(Optional, integer)
65+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
66+
5167
`num_allocations`:::
5268
(Required, integer)
53-
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
69+
The total number of allocations this model is assigned across machine learning nodes.
70+
Increasing this value generally increases the throughput.
71+
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
5472

5573
`num_threads`:::
5674
(Required, integer)
@@ -107,3 +125,30 @@ This error usually just reflects a timeout, while the model downloads in the bac
107125
You can check the download progress in the {ml-app} UI.
108126
If using the Python client, you can set the `timeout` parameter to a higher value.
109127
====
128+
129+
[discrete]
130+
[[inference-example-elser-adaptive-allocation]]
131+
==== Setting adaptive allocation for the ELSER service
132+
133+
The following example shows how to create an {infer} endpoint called
134+
`my-elser-model` to perform a `sparse_embedding` task type and configure
135+
adaptive allocations.
136+
137+
The request below will automatically download the ELSER model if it isn't
138+
already downloaded and then deploy the model.
139+
140+
[source,console]
141+
------------------------------------------------------------
142+
PUT _inference/sparse_embedding/my-elser-model
143+
{
144+
"service": "elser",
145+
"service_settings": {
146+
"adaptive_allocations": {
147+
"enabled": true,
148+
"min_number_of_allocations": 3,
149+
"max_number_of_allocations": 10
150+
}
151+
}
152+
}
153+
------------------------------------------------------------
154+
// TEST[skip:TBD]

docs/reference/ml/ml-shared.asciidoc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1+
tag::adaptive-allocation[]
2+
Adaptive allocations configuration object.
3+
If enabled, the number of allocations of the model is set based on the current load the process gets.
4+
When the load is high, a new model allocation is automatically created (respecting the value of `max_number_of_allocations` if it's set).
5+
When the load is low, a model allocation is automatically removed (respecting the value of `min_number_of_allocations` if it's set).
6+
The number of model allocations cannot be scaled down to less than `1` this way.
7+
If `adaptive_allocations` is enabled, do not set the number of allocations manually.
8+
end::adaptive-allocation[]
9+
10+
tag::adaptive-allocation-enabled[]
11+
If `true`, `adaptive_allocations` is enabled.
12+
Defaults to `false`.
13+
end::adaptive-allocation-enabled[]
14+
15+
tag::adaptive-allocation-max-number[]
16+
Specifies the maximum number of allocations to scale to.
17+
If set, it must be greater than or equal to `min_number_of_allocations`.
18+
end::adaptive-allocation-max-number[]
19+
20+
tag::adaptive-allocation-min-number[]
21+
Specifies the minimum number of allocations to scale to.
22+
If set, it must be greater than or equal to `1`.
23+
end::adaptive-allocation-min-number[]
24+
125
tag::aggregations[]
226
If set, the {dfeed} performs aggregation searches. Support for aggregations is
327
limited and should be used only with low cardinality data. For more information,

docs/reference/ml/trained-models/apis/start-trained-model-deployment.asciidoc

Lines changed: 67 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,10 @@ must be unique and should not match any other deployment ID or model ID, unless
3030
it is the same as the ID of the model being deployed. If `deployment_id` is not
3131
set, it defaults to the `model_id`.
3232

33-
Scaling inference performance can be achieved by setting the parameters
33+
You can enable adaptive allocations to automatically scale model allocations up
34+
and down based on the actual resource requirement of the processes.
35+
36+
Manually scaling inference performance can be achieved by setting the parameters
3437
`number_of_allocations` and `threads_per_allocation`.
3538

3639
Increasing `threads_per_allocation` means more threads are used when an
@@ -58,22 +61,58 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=model-id]
5861
[[start-trained-model-deployment-query-params]]
5962
== {api-query-parms-title}
6063

64+
`deployment_id`::
65+
(Optional, string)
66+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
67+
+
68+
--
69+
Defaults to `model_id`.
70+
--
71+
72+
`timeout`::
73+
(Optional, time)
74+
Controls the amount of time to wait for the model to deploy. Defaults to 30
75+
seconds.
76+
77+
`wait_for`::
78+
(Optional, string)
79+
Specifies the allocation status to wait for before returning. Defaults to
80+
`started`. The value `starting` indicates deployment is starting but not yet on
81+
any node. The value `started` indicates the model has started on at least one
82+
node. The value `fully_allocated` indicates the deployment has started on all
83+
valid nodes.
84+
85+
[[start-trained-model-deployment-request-body]]
86+
== {api-request-body-title}
87+
88+
`adaptive_allocations`::
89+
(Optional, object)
90+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
91+
92+
`enabled`:::
93+
(Optional, Boolean)
94+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
95+
96+
`max_number_of_allocations`:::
97+
(Optional, integer)
98+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
99+
100+
`min_number_of_allocations`:::
101+
(Optional, integer)
102+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
103+
61104
`cache_size`::
62105
(Optional, <<byte-units,byte value>>)
63106
The inference cache size (in memory outside the JVM heap) per node for the
64107
model. In serverless, the cache is disabled by default. Otherwise, the default value is the size of the model as reported by the
65108
`model_size_bytes` field in the <<get-trained-models-stats>>. To disable the
66109
cache, `0b` can be provided.
67110

68-
`deployment_id`::
69-
(Optional, string)
70-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
71-
Defaults to `model_id`.
72-
73111
`number_of_allocations`::
74112
(Optional, integer)
75113
The total number of allocations this model is assigned across {ml} nodes.
76-
Increasing this value generally increases the throughput. Defaults to 1.
114+
Increasing this value generally increases the throughput. Defaults to `1`.
115+
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
77116

78117
`priority`::
79118
(Optional, string)
@@ -110,18 +149,6 @@ compute-bound process; `threads_per_allocations` must not exceed the number of
110149
available allocated processors per node. Defaults to 1. Must be a power of 2.
111150
Max allowed value is 32.
112151

113-
`timeout`::
114-
(Optional, time)
115-
Controls the amount of time to wait for the model to deploy. Defaults to 30
116-
seconds.
117-
118-
`wait_for`::
119-
(Optional, string)
120-
Specifies the allocation status to wait for before returning. Defaults to
121-
`started`. The value `starting` indicates deployment is starting but not yet on
122-
any node. The value `started` indicates the model has started on at least one
123-
node. The value `fully_allocated` indicates the deployment has started on all
124-
valid nodes.
125152

126153
[[start-trained-model-deployment-example]]
127154
== {api-examples-title}
@@ -182,3 +209,24 @@ The `my_model` trained model can be deployed again with a different ID:
182209
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
183210
--------------------------------------------------
184211
// TEST[skip:TBD]
212+
213+
214+
[[start-trained-model-deployment-adaptive-allocation-example]]
215+
=== Setting adaptive allocations
216+
217+
The following example starts a new deployment of the `my_model` trained model
218+
with the ID `my_model_for_search` and enables adaptive allocations with the
219+
minimum number of 3 allocations and the maximum number of 10.
220+
221+
[source,console]
222+
--------------------------------------------------
223+
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
224+
{
225+
"adaptive_allocations": {
226+
"enabled": true,
227+
"min_number_of_allocations": 3,
228+
"max_number_of_allocations": 10
229+
}
230+
}
231+
--------------------------------------------------
232+
// TEST[skip:TBD]

docs/reference/ml/trained-models/apis/update-trained-model-deployment.asciidoc

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,11 @@ Requires the `manage_ml` cluster privilege. This privilege is included in the
2525
== {api-description-title}
2626

2727
You can update a trained model deployment whose `assignment_state` is `started`.
28-
You can either increase or decrease the number of allocations of such a deployment.
28+
You can enable adaptive allocations to automatically scale model allocations up
29+
and down based on the actual resource requirement of the processes.
30+
Or you can manually increase or decrease the number of allocations of a model
31+
deployment.
32+
2933

3034
[[update-trained-model-deployments-path-parms]]
3135
== {api-path-parms-title}
@@ -37,17 +41,34 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
3741
[[update-trained-model-deployment-request-body]]
3842
== {api-request-body-title}
3943

44+
`adaptive_allocations`::
45+
(Optional, object)
46+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
47+
48+
`enabled`:::
49+
(Optional, Boolean)
50+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
51+
52+
`max_number_of_allocations`:::
53+
(Optional, integer)
54+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
55+
56+
`min_number_of_allocations`:::
57+
(Optional, integer)
58+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
59+
4060
`number_of_allocations`::
4161
(Optional, integer)
4262
The total number of allocations this model is assigned across {ml} nodes.
4363
Increasing this value generally increases the throughput.
64+
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
4465

4566

4667
[[update-trained-model-deployment-example]]
4768
== {api-examples-title}
4869

4970
The following example updates the deployment for a
50-
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
71+
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
5172

5273
[source,console]
5374
--------------------------------------------------
@@ -84,3 +105,21 @@ The API returns the following results:
84105
}
85106
}
86107
----
108+
109+
The following example updates the deployment for a
110+
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to
111+
enable adaptive allocations with the minimum number of 3 allocations and the
112+
maximum number of 10:
113+
114+
[source,console]
115+
--------------------------------------------------
116+
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update
117+
{
118+
"adaptive_allocations": {
119+
"enabled": true,
120+
"min_number_of_allocations": 3,
121+
"max_number_of_allocations": 10
122+
}
123+
}
124+
--------------------------------------------------
125+
// TEST[skip:TBD]

0 commit comments

Comments
 (0)