Skip to content

Commit 7837a96

Browse files
authored
[DOCS] Adds EIS reference docs (#120706)
1 parent e18baa1 commit 7837a96

File tree

3 files changed

+126
-0
lines changed

3 files changed

+126
-0
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
[[infer-service-elastic]]
2+
=== Elastic {infer-cap} Service (EIS)
3+
4+
.New API reference
5+
[sidebar]
6+
--
7+
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
8+
--
9+
10+
Creates an {infer} endpoint to perform an {infer} task with the `elastic` service.
11+
12+
13+
[discrete]
14+
[[infer-service-elastic-api-request]]
15+
==== {api-request-title}
16+
17+
18+
`PUT /_inference/<task_type>/<inference_id>`
19+
20+
[discrete]
21+
[[infer-service-elastic-api-path-params]]
22+
==== {api-path-parms-title}
23+
24+
25+
`<inference_id>`::
26+
(Required, string)
27+
include::inference-shared.asciidoc[tag=inference-id]
28+
29+
`<task_type>`::
30+
(Required, string)
31+
include::inference-shared.asciidoc[tag=task-type]
32+
+
33+
--
34+
Available task types:
35+
36+
* `chat_completion`,
37+
* `sparse_embedding`.
38+
--
39+
40+
[NOTE]
41+
====
42+
The `chat_completion` task type only supports streaming and only through the `_unified` API.
43+
44+
include::inference-shared.asciidoc[tag=chat-completion-docs]
45+
====
46+
47+
[discrete]
48+
[[infer-service-elastic-api-request-body]]
49+
==== {api-request-body-title}
50+
51+
52+
`max_chunking_size`:::
53+
(Optional, integer)
54+
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
55+
56+
`overlap`:::
57+
(Optional, integer)
58+
include::inference-shared.asciidoc[tag=chunking-settings-overlap]
59+
60+
`sentence_overlap`:::
61+
(Optional, integer)
62+
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
63+
64+
`strategy`:::
65+
(Optional, string)
66+
include::inference-shared.asciidoc[tag=chunking-settings-strategy]
67+
68+
`service`::
69+
(Required, string)
70+
The type of service supported for the specified task type. In this case,
71+
`elastic`.
72+
73+
`service_settings`::
74+
(Required, object)
75+
include::inference-shared.asciidoc[tag=service-settings]
76+
77+
`model_id`:::
78+
(Required, string)
79+
The name of the model to use for the {infer} task.
80+
81+
`rate_limit`:::
82+
(Optional, object)
83+
By default, the `elastic` service sets the number of requests allowed per minute to `1000` in case of `sparse_embedding` and `240` in case of `chat_completion`.
84+
This helps to minimize the number of rate limit errors returned.
85+
To modify this, set the `requests_per_minute` setting of this object in your service settings:
86+
+
87+
--
88+
include::inference-shared.asciidoc[tag=request-per-minute-example]
89+
--
90+
91+
92+
[discrete]
93+
[[inference-example-elastic]]
94+
==== Elastic {infer-cap} Service example
95+
96+
97+
The following example shows how to create an {infer} endpoint called `elser-model-eis` to perform a `text_embedding` task type.
98+
99+
[source,console]
100+
------------------------------------------------------------
101+
PUT _inference/sparse_embedding/elser-model-eis
102+
{
103+
"service": "elastic",
104+
"service_settings": {
105+
"model_name": "elser"
106+
}
107+
}
108+
109+
------------------------------------------------------------
110+
// TEST[skip:TBD]
111+
112+
The following example shows how to create an {infer} endpoint called `chat-completion-endpoint` to perform a `chat_completion` task type.
113+
114+
[source,console]
115+
------------------------------------------------------------
116+
PUT /_inference/chat_completion/chat-completion-endpoint
117+
{
118+
"service": "elastic",
119+
"service_settings": {
120+
"model_id": "model-1"
121+
}
122+
}
123+
------------------------------------------------------------
124+
// TEST[skip:TBD]

docs/reference/inference/inference-apis.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ include::chat-completion-inference.asciidoc[]
136136
include::put-inference.asciidoc[]
137137
include::stream-inference.asciidoc[]
138138
include::update-inference.asciidoc[]
139+
include::elastic-infer-service.asciidoc[]
139140
include::service-alibabacloud-ai-search.asciidoc[]
140141
include::service-amazon-bedrock.asciidoc[]
141142
include::service-anthropic.asciidoc[]

docs/reference/inference/put-inference.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a
5959
* Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
6060
====
6161

62+
You can create an {infer} endpoint that uses the <<infer-service-elastic>> to perform {infer} tasks as a service without the need of deploying a model in your environment.
6263

6364
The following integrations are available through the {infer} API.
6465
You can find the available task types next to the integration name.

0 commit comments

Comments
 (0)