Skip to content

Commit dde00b6

Browse files
authored
Revert "[DOCS] Adds adaptive_allocations to inference and trained model API d…" (#111551)
This reverts commit 7d10307.
1 parent 1c02690 commit dde00b6

File tree

5 files changed

+23
-225
lines changed

5 files changed

+23
-225
lines changed

docs/reference/inference/service-elasticsearch.asciidoc

Lines changed: 1 addition & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -51,22 +51,6 @@ include::inference-shared.asciidoc[tag=service-settings]
5151
These settings are specific to the `elasticsearch` service.
5252
--
5353

54-
`adaptive_allocations`:::
55-
(Optional, object)
56-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
57-
58-
`enabled`::::
59-
(Optional, Boolean)
60-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
61-
62-
`max_number_of_allocations`::::
63-
(Optional, integer)
64-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
65-
66-
`min_number_of_allocations`::::
67-
(Optional, integer)
68-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
69-
7054
`model_id`:::
7155
(Required, string)
7256
The name of the model to use for the {infer} task.
@@ -75,9 +59,7 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal
7559

7660
`num_allocations`:::
7761
(Required, integer)
78-
The total number of allocations this model is assigned across machine learning nodes.
79-
Increasing this value generally increases the throughput.
80-
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
62+
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
8163

8264
`num_threads`:::
8365
(Required, integer)
@@ -155,31 +137,3 @@ PUT _inference/text_embedding/my-msmarco-minilm-model <1>
155137
<1> Provide an unique identifier for the inference endpoint. The `inference_id` must be unique and must not match the `model_id`.
156138
<2> The `model_id` must be the ID of a text embedding model which has already been
157139
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
158-
159-
[discrete]
160-
[[inference-example-adaptive-allocation]]
161-
==== Setting adaptive allocation for E5 via the `elasticsearch` service
162-
163-
The following example shows how to create an {infer} endpoint called
164-
`my-e5-model` to perform a `text_embedding` task type and configure adaptive
165-
allocations.
166-
167-
The API request below will automatically download the E5 model if it isn't
168-
already downloaded and then deploy the model.
169-
170-
[source,console]
171-
------------------------------------------------------------
172-
PUT _inference/text_embedding/my-e5-model
173-
{
174-
"service": "elasticsearch",
175-
"service_settings": {
176-
"adaptive_allocations": {
177-
"enabled": true,
178-
"min_number_of_allocations": 3,
179-
"max_number_of_allocations": 10
180-
},
181-
"model_id": ".multilingual-e5-small"
182-
}
183-
}
184-
------------------------------------------------------------
185-
// TEST[skip:TBD]

docs/reference/inference/service-elser.asciidoc

Lines changed: 1 addition & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -48,27 +48,9 @@ include::inference-shared.asciidoc[tag=service-settings]
4848
These settings are specific to the `elser` service.
4949
--
5050

51-
`adaptive_allocations`:::
52-
(Optional, object)
53-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
54-
55-
`enabled`::::
56-
(Optional, Boolean)
57-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
58-
59-
`max_number_of_allocations`::::
60-
(Optional, integer)
61-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
62-
63-
`min_number_of_allocations`::::
64-
(Optional, integer)
65-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
66-
6751
`num_allocations`:::
6852
(Required, integer)
69-
The total number of allocations this model is assigned across machine learning nodes.
70-
Increasing this value generally increases the throughput.
71-
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
53+
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
7254

7355
`num_threads`:::
7456
(Required, integer)
@@ -125,30 +107,3 @@ This error usually just reflects a timeout, while the model downloads in the bac
125107
You can check the download progress in the {ml-app} UI.
126108
If using the Python client, you can set the `timeout` parameter to a higher value.
127109
====
128-
129-
[discrete]
130-
[[inference-example-elser-adaptive-allocation]]
131-
==== Setting adaptive allocation for the ELSER service
132-
133-
The following example shows how to create an {infer} endpoint called
134-
`my-elser-model` to perform a `sparse_embedding` task type and configure
135-
adaptive allocations.
136-
137-
The request below will automatically download the ELSER model if it isn't
138-
already downloaded and then deploy the model.
139-
140-
[source,console]
141-
------------------------------------------------------------
142-
PUT _inference/sparse_embedding/my-elser-model
143-
{
144-
"service": "elser",
145-
"service_settings": {
146-
"adaptive_allocations": {
147-
"enabled": true,
148-
"min_number_of_allocations": 3,
149-
"max_number_of_allocations": 10
150-
}
151-
}
152-
}
153-
------------------------------------------------------------
154-
// TEST[skip:TBD]

docs/reference/ml/ml-shared.asciidoc

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,3 @@
1-
tag::adaptive-allocation[]
2-
Adaptive allocations configuration object.
3-
If enabled, the number of allocations of the model is set based on the current load the process gets.
4-
When the load is high, a new model allocation is automatically created (respecting the value of `max_number_of_allocations` if it's set).
5-
When the load is low, a model allocation is automatically removed (respecting the value of `min_number_of_allocations` if it's set).
6-
The number of model allocations cannot be scaled down to less than `1` this way.
7-
If `adaptive_allocations` is enabled, do not set the number of allocations manually.
8-
end::adaptive-allocation[]
9-
10-
tag::adaptive-allocation-enabled[]
11-
If `true`, `adaptive_allocations` is enabled.
12-
Defaults to `false`.
13-
end::adaptive-allocation-enabled[]
14-
15-
tag::adaptive-allocation-max-number[]
16-
Specifies the maximum number of allocations to scale to.
17-
If set, it must be greater than or equal to `min_number_of_allocations`.
18-
end::adaptive-allocation-max-number[]
19-
20-
tag::adaptive-allocation-min-number[]
21-
Specifies the minimum number of allocations to scale to.
22-
If set, it must be greater than or equal to `1`.
23-
end::adaptive-allocation-min-number[]
24-
251
tag::aggregations[]
262
If set, the {dfeed} performs aggregation searches. Support for aggregations is
273
limited and should be used only with low cardinality data. For more information,

docs/reference/ml/trained-models/apis/start-trained-model-deployment.asciidoc

Lines changed: 19 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,7 @@ must be unique and should not match any other deployment ID or model ID, unless
3030
it is the same as the ID of the model being deployed. If `deployment_id` is not
3131
set, it defaults to the `model_id`.
3232

33-
You can enable adaptive allocations to automatically scale model allocations up
34-
and down based on the actual resource requirement of the processes.
35-
36-
Manually scaling inference performance can be achieved by setting the parameters
33+
Scaling inference performance can be achieved by setting the parameters
3734
`number_of_allocations` and `threads_per_allocation`.
3835

3936
Increasing `threads_per_allocation` means more threads are used when an
@@ -61,58 +58,22 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=model-id]
6158
[[start-trained-model-deployment-query-params]]
6259
== {api-query-parms-title}
6360

64-
`deployment_id`::
65-
(Optional, string)
66-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
67-
+
68-
--
69-
Defaults to `model_id`.
70-
--
71-
72-
`timeout`::
73-
(Optional, time)
74-
Controls the amount of time to wait for the model to deploy. Defaults to 30
75-
seconds.
76-
77-
`wait_for`::
78-
(Optional, string)
79-
Specifies the allocation status to wait for before returning. Defaults to
80-
`started`. The value `starting` indicates deployment is starting but not yet on
81-
any node. The value `started` indicates the model has started on at least one
82-
node. The value `fully_allocated` indicates the deployment has started on all
83-
valid nodes.
84-
85-
[[start-trained-model-deployment-request-body]]
86-
== {api-request-body-title}
87-
88-
`adaptive_allocations`::
89-
(Optional, object)
90-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
91-
92-
`enabled`:::
93-
(Optional, Boolean)
94-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
95-
96-
`max_number_of_allocations`:::
97-
(Optional, integer)
98-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
99-
100-
`min_number_of_allocations`:::
101-
(Optional, integer)
102-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
103-
10461
`cache_size`::
10562
(Optional, <<byte-units,byte value>>)
10663
The inference cache size (in memory outside the JVM heap) per node for the
10764
model. In serverless, the cache is disabled by default. Otherwise, the default value is the size of the model as reported by the
10865
`model_size_bytes` field in the <<get-trained-models-stats>>. To disable the
10966
cache, `0b` can be provided.
11067

68+
`deployment_id`::
69+
(Optional, string)
70+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
71+
Defaults to `model_id`.
72+
11173
`number_of_allocations`::
11274
(Optional, integer)
11375
The total number of allocations this model is assigned across {ml} nodes.
114-
Increasing this value generally increases the throughput. Defaults to `1`.
115-
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
76+
Increasing this value generally increases the throughput. Defaults to 1.
11677

11778
`priority`::
11879
(Optional, string)
@@ -149,6 +110,18 @@ compute-bound process; `threads_per_allocations` must not exceed the number of
149110
available allocated processors per node. Defaults to 1. Must be a power of 2.
150111
Max allowed value is 32.
151112

113+
`timeout`::
114+
(Optional, time)
115+
Controls the amount of time to wait for the model to deploy. Defaults to 30
116+
seconds.
117+
118+
`wait_for`::
119+
(Optional, string)
120+
Specifies the allocation status to wait for before returning. Defaults to
121+
`started`. The value `starting` indicates deployment is starting but not yet on
122+
any node. The value `started` indicates the model has started on at least one
123+
node. The value `fully_allocated` indicates the deployment has started on all
124+
valid nodes.
152125

153126
[[start-trained-model-deployment-example]]
154127
== {api-examples-title}
@@ -209,24 +182,3 @@ The `my_model` trained model can be deployed again with a different ID:
209182
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
210183
--------------------------------------------------
211184
// TEST[skip:TBD]
212-
213-
214-
[[start-trained-model-deployment-adaptive-allocation-example]]
215-
=== Setting adaptive allocations
216-
217-
The following example starts a new deployment of the `my_model` trained model
218-
with the ID `my_model_for_search` and enables adaptive allocations with the
219-
minimum number of 3 allocations and the maximum number of 10.
220-
221-
[source,console]
222-
--------------------------------------------------
223-
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
224-
{
225-
"adaptive_allocations": {
226-
"enabled": true,
227-
"min_number_of_allocations": 3,
228-
"max_number_of_allocations": 10
229-
}
230-
}
231-
--------------------------------------------------
232-
// TEST[skip:TBD]

docs/reference/ml/trained-models/apis/update-trained-model-deployment.asciidoc

Lines changed: 2 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,7 @@ Requires the `manage_ml` cluster privilege. This privilege is included in the
2525
== {api-description-title}
2626

2727
You can update a trained model deployment whose `assignment_state` is `started`.
28-
You can enable adaptive allocations to automatically scale model allocations up
29-
and down based on the actual resource requirement of the processes.
30-
Or you can manually increase or decrease the number of allocations of a model
31-
deployment.
32-
28+
You can either increase or decrease the number of allocations of such a deployment.
3329

3430
[[update-trained-model-deployments-path-parms]]
3531
== {api-path-parms-title}
@@ -41,34 +37,17 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
4137
[[update-trained-model-deployment-request-body]]
4238
== {api-request-body-title}
4339

44-
`adaptive_allocations`::
45-
(Optional, object)
46-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
47-
48-
`enabled`:::
49-
(Optional, Boolean)
50-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
51-
52-
`max_number_of_allocations`:::
53-
(Optional, integer)
54-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
55-
56-
`min_number_of_allocations`:::
57-
(Optional, integer)
58-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
59-
6040
`number_of_allocations`::
6141
(Optional, integer)
6242
The total number of allocations this model is assigned across {ml} nodes.
6343
Increasing this value generally increases the throughput.
64-
If `adaptive_allocations` is enabled, do not set this value, because it's automatically set.
6544

6645

6746
[[update-trained-model-deployment-example]]
6847
== {api-examples-title}
6948

7049
The following example updates the deployment for a
71-
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
50+
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
7251

7352
[source,console]
7453
--------------------------------------------------
@@ -105,21 +84,3 @@ The API returns the following results:
10584
}
10685
}
10786
----
108-
109-
The following example updates the deployment for a
110-
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to
111-
enable adaptive allocations with the minimum number of 3 allocations and the
112-
maximum number of 10:
113-
114-
[source,console]
115-
--------------------------------------------------
116-
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update
117-
{
118-
"adaptive_allocations": {
119-
"enabled": true,
120-
"min_number_of_allocations": 3,
121-
"max_number_of_allocations": 10
122-
}
123-
}
124-
--------------------------------------------------
125-
// TEST[skip:TBD]

0 commit comments

Comments
 (0)