You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/inference/service-elser.asciidoc
+27-28Lines changed: 27 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,33 @@ If `adaptive_allocations` is enabled, do not set this value, because it's automa
96
96
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
97
97
Must be a power of 2. Max allowed value is 32.
98
98
99
+
[discrete]
100
+
[[inference-example-elser-adaptive-allocation]]
101
+
==== Setting adaptive allocations for the ELSER service
102
+
103
+
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
104
+
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
105
+
106
+
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
107
+
108
+
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
@@ -146,31 +173,3 @@ This error usually just reflects a timeout, while the model downloads in the bac
146
173
You can check the download progress in the {ml-app} UI.
147
174
If using the Python client, you can set the `timeout` parameter to a higher value.
148
175
====
149
-
150
-
[discrete]
151
-
[[inference-example-elser-adaptive-allocation]]
152
-
==== Setting adaptive allocations for the ELSER service
153
-
154
-
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
155
-
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.
156
-
157
-
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type and configure adaptive allocations.
158
-
159
-
The request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
0 commit comments