Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions docs/reference/inference/elastic-infer-service.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
[[infer-service-elastic]]
=== Elastic {infer-cap} Service (EIS)

.New API reference
[sidebar]
--
For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
--

Creates an {infer} endpoint to perform an {infer} task with the `elastic` service.


[discrete]
[[infer-service-elastic-api-request]]
==== {api-request-title}


`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-elastic-api-path-params]]
==== {api-path-parms-title}


`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `chat_completion`,
* `sparse_embedding`.
--

[NOTE]
====
The `chat_completion` task type only supports streaming and only through the `_unified` API.

include::inference-shared.asciidoc[tag=chat-completion-docs]
====

[discrete]
[[infer-service-elastic-api-request-body]]
==== {api-request-body-title}


`max_chunking_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`elastic`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]

`model_id`:::
(Required, string)
The name of the model to use for the {infer} task.

`rate_limit`:::
(Optional, object)
By default, the `elastic` service sets the number of requests allowed per minute to `1000` in case of `sparse_embedding` and `240` in case of `chat_completion`.
This helps to minimize the number of rate limit errors returned.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]
--


[discrete]
[[inference-example-elastic]]
==== Elastic {infer-cap} Service example


The following example shows how to create an {infer} endpoint called `elser-model-eis` to perform a `text_embedding` task type.

[source,console]
------------------------------------------------------------
PUT _inference/sparse_embedding/elser-model-eis
{
"service": "elastic",
"service_settings": {
"model_name": "elser"
}
}

------------------------------------------------------------
// TEST[skip:TBD]

The following example shows how to create an {infer} endpoint called `chat-completion-endpoint` to perform a `chat_completion` task type.

[source,console]
------------------------------------------------------------
PUT /_inference/chat_completion/chat-completion-endpoint
{
"service": "elastic",
"service_settings": {
"model_id": "model-1"
}
}
------------------------------------------------------------
// TEST[skip:TBD]
1 change: 1 addition & 0 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ include::chat-completion-inference.asciidoc[]
include::put-inference.asciidoc[]
include::stream-inference.asciidoc[]
include::update-inference.asciidoc[]
include::elastic-infer-service.asciidoc[]
include::service-alibabacloud-ai-search.asciidoc[]
include::service-amazon-bedrock.asciidoc[]
include::service-anthropic.asciidoc[]
Expand Down
1 change: 1 addition & 0 deletions docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a
* Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
====

You can create an {infer} endpoint that uses the <<infer-service-elastic>> to perform {infer} tasks as a service without the need of deploying a model in your environment.

The following integrations are available through the {infer} API.
You can find the available task types next to the integration name.
Expand Down
Loading