Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions docs/reference/inference/elastic-infer-service.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
[[infer-service-elastic]]
=== Elastic {infer-cap} Service (EIS)

.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
--

Creates an {infer} endpoint to perform an {infer} task with the `elastic` service.


[discrete]
[[infer-service-elastic-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-elastic-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `chat_completion`,
* `sparse_embedding`.
--

[NOTE]
====
The `chat_completion` task type only supports streaming and only through the `_unified` API.

include::inference-shared.asciidoc[tag=chat-completion-docs]
====

[discrete]
[[infer-service-elastic-api-request-body]]
==== {api-request-body-title}

`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]

`max_chunking_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`openai`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `elser` service.
--

`model_id`:::
(Required, string)
The name of the model to use for the {infer} task.

`rate_limit`:::
(Optional, object)
By default, the `elastic` service sets the number of requests allowed per minute to `1000`.
This helps to minimize the number of rate limit errors returned.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]
--

`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `chat_completion` task type
[%collapsible%closed]
=====
`user`:::
(Optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====
+
.`task_settings` for the `sparse_embedding` task type
[%collapsible%closed]
=====
`user`:::
(optional, string)
Specifies the user issuing the request, which can be used for abuse detection.
=====


[discrete]
[[inference-example-elastic]]
==== Elastic {infer-cap} Service example

The following example shows how to create an {infer} endpoint called `elser-model-eis` to perform a `text_embedding` task type.

[source,console]
------------------------------------------------------------
PUT _inference/sparse_embedding/elser-model-eis
{
"service": "elastic",
"service_settings": {
"model_name": "elser"
}
}

------------------------------------------------------------
// TEST[skip:TBD]
1 change: 1 addition & 0 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ include::chat-completion-inference.asciidoc[]
include::put-inference.asciidoc[]
include::stream-inference.asciidoc[]
include::update-inference.asciidoc[]
include::elastic-infer-service.asciidoc[]
include::service-alibabacloud-ai-search.asciidoc[]
include::service-amazon-bedrock.asciidoc[]
include::service-anthropic.asciidoc[]
Expand Down
1 change: 1 addition & 0 deletions docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a
* Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
====

You can create an {infer} endpoint that uses the <<infer-service-elastic>> to perform {infer} tasks as a service without the need of deploying a model in your environment.

The following integrations are available through the {infer} API.
You can find the available task types next to the integration name.
Expand Down