Skip to content

Commit a3bb779

Browse files
[DOCS] : swap allocation sections (#116518)
Co-authored-by: Liam Thompson <[email protected]>
1 parent bdebe39 commit a3bb779

File tree

1 file changed

+31
-30
lines changed

1 file changed

+31
-30
lines changed

docs/reference/inference/service-elser.asciidoc

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,39 @@ If `adaptive_allocations` is enabled, do not set this value, because it's automa
102102
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
103103
Must be a power of 2. Max allowed value is 32.
104104

105+
[discrete]
106+
[[inference-example-elser-adaptive-allocation]]
107+
==== ELSER service example with adaptive allocations
108+
109+
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
110+
111+
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
112+
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
113+
114+
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
115+
116+
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
117+
118+
[source,console]
119+
------------------------------------------------------------
120+
PUT _inference/sparse_embedding/my-elser-model
121+
{
122+
"service": "elser",
123+
"service_settings": {
124+
"adaptive_allocations": {
125+
"enabled": true,
126+
"min_number_of_allocations": 3,
127+
"max_number_of_allocations": 10
128+
},
129+
"num_threads": 1
130+
}
131+
}
132+
------------------------------------------------------------
133+
// TEST[skip:TBD]
105134

106135
[discrete]
107136
[[inference-example-elser]]
108-
==== ELSER service example
137+
==== ELSER service example without adaptive allocations
109138

110139
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type.
111140
Refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation] for more info.
@@ -151,32 +180,4 @@ You might see a 502 bad gateway error in the response when using the {kib} Conso
151180
This error usually just reflects a timeout, while the model downloads in the background.
152181
You can check the download progress in the {ml-app} UI.
153182
If using the Python client, you can set the `timeout` parameter to a higher value.
154-
====
155-
156-
[discrete]
157-
[[inference-example-elser-adaptive-allocation]]
158-
==== Setting adaptive allocations for the ELSER service
159-
160-
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
161-
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
162-
163-
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
164-
165-
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
166-
167-
[source,console]
168-
------------------------------------------------------------
169-
PUT _inference/sparse_embedding/my-elser-model
170-
{
171-
"service": "elser",
172-
"service_settings": {
173-
"adaptive_allocations": {
174-
"enabled": true,
175-
"min_number_of_allocations": 3,
176-
"max_number_of_allocations": 10
177-
},
178-
"num_threads": 1
179-
}
180-
}
181-
------------------------------------------------------------
182-
// TEST[skip:TBD]
183+
====

0 commit comments

Comments
 (0)