Skip to content

Commit 4abef8e

Browse files
[DOCS] : swap allocation sections
1 parent af99654 commit 4abef8e

File tree

1 file changed

+27
-28
lines changed

1 file changed

+27
-28
lines changed

docs/reference/inference/service-elser.asciidoc

Lines changed: 27 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,33 @@ If `adaptive_allocations` is enabled, do not set this value, because it's automa
9696
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
9797
Must be a power of 2. Max allowed value is 32.
9898

99+
[discrete]
100+
[[inference-example-elser-adaptive-allocation]]
101+
==== Setting adaptive allocations for the ELSER service
102+
103+
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
104+
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
105+
106+
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
107+
108+
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
109+
110+
[source,console]
111+
------------------------------------------------------------
112+
PUT _inference/sparse_embedding/my-elser-model
113+
{
114+
"service": "elser",
115+
"service_settings": {
116+
"adaptive_allocations": {
117+
"enabled": true,
118+
"min_number_of_allocations": 3,
119+
"max_number_of_allocations": 10
120+
},
121+
"num_threads": 1
122+
}
123+
}
124+
------------------------------------------------------------
125+
// TEST[skip:TBD]
99126

100127
[discrete]
101128
[[inference-example-elser]]
@@ -146,31 +173,3 @@ This error usually just reflects a timeout, while the model downloads in the bac
146173
You can check the download progress in the {ml-app} UI.
147174
If using the Python client, you can set the `timeout` parameter to a higher value.
148175
====
149-
150-
[discrete]
151-
[[inference-example-elser-adaptive-allocation]]
152-
==== Setting adaptive allocations for the ELSER service
153-
154-
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
155-
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
156-
157-
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
158-
159-
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
160-
161-
[source,console]
162-
------------------------------------------------------------
163-
PUT _inference/sparse_embedding/my-elser-model
164-
{
165-
"service": "elser",
166-
"service_settings": {
167-
"adaptive_allocations": {
168-
"enabled": true,
169-
"min_number_of_allocations": 3,
170-
"max_number_of_allocations": 10
171-
},
172-
"num_threads": 1
173-
}
174-
}
175-
------------------------------------------------------------
176-
// TEST[skip:TBD]

0 commit comments

Comments
 (0)