You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/inference/service-elser.asciidoc
+31-30Lines changed: 31 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,10 +102,39 @@ If `adaptive_allocations` is enabled, do not set this value, because it's automa
102
102
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
103
103
Must be a power of 2. Max allowed value is 32.
104
104
105
+
[discrete]
106
+
[[inference-example-elser-adaptive-allocation]]
107
+
==== ELSER service example with adaptive allocations
108
+
109
+
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
110
+
111
+
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
112
+
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
113
+
114
+
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
115
+
116
+
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
==== ELSER service example without adaptive allocations
109
138
110
139
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type.
111
140
Refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation] for more info.
@@ -151,32 +180,4 @@ You might see a 502 bad gateway error in the response when using the {kib} Conso
151
180
This error usually just reflects a timeout, while the model downloads in the background.
152
181
You can check the download progress in the {ml-app} UI.
153
182
If using the Python client, you can set the `timeout` parameter to a higher value.
154
-
====
155
-
156
-
[discrete]
157
-
[[inference-example-elser-adaptive-allocation]]
158
-
==== Setting adaptive allocations for the ELSER service
159
-
160
-
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
161
-
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
162
-
163
-
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
164
-
165
-
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
0 commit comments