Skip to content

Commit 36e95ca

Browse files
[DOCS] Improve inference API documentation (elastic#115235) (elastic#115525)
Co-authored-by: David Kyle <[email protected]>
1 parent b755d40 commit 36e95ca

File tree

3 files changed

+104
-11
lines changed

3 files changed

+104
-11
lines changed

docs/reference/inference/inference-apis.asciidoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,24 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
3434
Now use <<semantic-search-semantic-text, semantic text>> to perform
3535
<<semantic-search, semantic search>> on your data.
3636

37+
38+
[discrete]
39+
[[default-enpoints]]
40+
=== Default {infer} endpoints
41+
42+
Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
43+
The following list contains the default {infer} endpoints listed by `inference_id`:
44+
45+
* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
46+
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
47+
48+
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
49+
The API call will automatically download and deploy the model which might take a couple of minutes.
50+
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
51+
For these models, the minimum number of allocations is `0`.
52+
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
53+
54+
3755
include::delete-inference.asciidoc[]
3856
include::get-inference.asciidoc[]
3957
include::post-inference.asciidoc[]

docs/reference/inference/service-elasticsearch.asciidoc

Lines changed: 84 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
[[infer-service-elasticsearch]]
22
=== Elasticsearch {infer} service
33

4-
Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch`
5-
service.
4+
Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` service.
65

7-
NOTE: If you use the E5 model through the `elasticsearch` service, the API
8-
request will automatically download and deploy the model if it isn't downloaded
9-
yet.
6+
NOTE: If you use the ELSER or the E5 model through the `elasticsearch` service, the API request will automatically download and deploy the model if it isn't downloaded yet.
107

118

129
[discrete]
@@ -56,6 +53,11 @@ These settings are specific to the `elasticsearch` service.
5653
(Optional, object)
5754
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
5855

56+
`deployment_id`:::
57+
(Optional, string)
58+
The `deployment_id` of an existing trained model deployment.
59+
When `deployment_id` is used the `model_id` is optional.
60+
5961
`enabled`::::
6062
(Optional, Boolean)
6163
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
@@ -71,7 +73,7 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
7173
`model_id`:::
7274
(Required, string)
7375
The name of the model to use for the {infer} task.
74-
It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already
76+
It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5), a text embedding model already
7577
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
7678

7779
`num_allocations`:::
@@ -98,15 +100,44 @@ Returns the document instead of only the index. Defaults to `true`.
98100
=====
99101

100102

103+
[discrete]
104+
[[inference-example-elasticsearch-elser]]
105+
==== ELSER via the `elasticsearch` service
106+
107+
The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type.
108+
109+
The API request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model.
110+
111+
[source,console]
112+
------------------------------------------------------------
113+
PUT _inference/sparse_embedding/my-elser-model
114+
{
115+
"service": "elasticsearch",
116+
"service_settings": {
117+
"adaptive_allocations": { <1>
118+
"enabled": true,
119+
"min_number_of_allocations": 1,
120+
"max_number_of_allocations": 10
121+
},
122+
"num_threads": 1,
123+
"model_id": ".elser_model_2" <2>
124+
}
125+
}
126+
------------------------------------------------------------
127+
// TEST[skip:TBD]
128+
<1> Adaptive allocations will be enabled with the minimum of 1 and the maximum of 10 allocations.
129+
<2> The `model_id` must be the ID of one of the built-in ELSER models.
130+
Valid values are `.elser_model_2` and `.elser_model_2_linux-x86_64`.
131+
For further details, refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation].
132+
133+
101134
[discrete]
102135
[[inference-example-elasticsearch]]
103136
==== E5 via the `elasticsearch` service
104137

105-
The following example shows how to create an {infer} endpoint called
106-
`my-e5-model` to perform a `text_embedding` task type.
138+
The following example shows how to create an {infer} endpoint called `my-e5-model` to perform a `text_embedding` task type.
107139

108-
The API request below will automatically download the E5 model if it isn't
109-
already downloaded and then deploy the model.
140+
The API request below will automatically download the E5 model if it isn't already downloaded and then deploy the model.
110141

111142
[source,console]
112143
------------------------------------------------------------
@@ -185,3 +216,46 @@ PUT _inference/text_embedding/my-e5-model
185216
}
186217
------------------------------------------------------------
187218
// TEST[skip:TBD]
219+
220+
221+
[discrete]
222+
[[inference-example-existing-deployment]]
223+
==== Using an existing model deployment with the `elasticsearch` service
224+
225+
The following example shows how to use an already existing model deployment when creating an {infer} endpoint.
226+
227+
[source,console]
228+
------------------------------------------------------------
229+
PUT _inference/sparse_embedding/use_existing_deployment
230+
{
231+
"service": "elasticsearch",
232+
"service_settings": {
233+
"deployment_id": ".elser_model_2" <1>
234+
}
235+
}
236+
------------------------------------------------------------
237+
// TEST[skip:TBD]
238+
<1> The `deployment_id` of the already existing model deployment.
239+
240+
The API response contains the `model_id`, and the threads and allocations settings from the model deployment:
241+
242+
[source,console-result]
243+
------------------------------------------------------------
244+
{
245+
"inference_id": "use_existing_deployment",
246+
"task_type": "sparse_embedding",
247+
"service": "elasticsearch",
248+
"service_settings": {
249+
"num_allocations": 2,
250+
"num_threads": 1,
251+
"model_id": ".elser_model_2",
252+
"deployment_id": ".elser_model_2"
253+
},
254+
"chunking_settings": {
255+
"strategy": "sentence",
256+
"max_chunk_size": 250,
257+
"sentence_overlap": 1
258+
}
259+
}
260+
------------------------------------------------------------
261+
// NOTCONSOLE

docs/reference/inference/service-elser.asciidoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
=== ELSER {infer} service
33

44
Creates an {infer} endpoint to perform an {infer} task with the `elser` service.
5+
You can also deploy ELSER by using the <<infer-service-elasticsearch>>.
56

67
NOTE: The API request will automatically download and deploy the ELSER model if
78
it isn't already downloaded.
@@ -128,7 +129,7 @@ If using the Python client, you can set the `timeout` parameter to a higher valu
128129

129130
[discrete]
130131
[[inference-example-elser-adaptive-allocation]]
131-
==== Setting adaptive allocation for the ELSER service
132+
==== Setting adaptive allocations for the ELSER service
132133

133134
NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation.
134135
To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.

0 commit comments

Comments
 (0)