diff --git a/docs/reference/inference/elastic-infer-service.asciidoc b/docs/reference/inference/elastic-infer-service.asciidoc new file mode 100644 index 0000000000000..f78bfa967cceb --- /dev/null +++ b/docs/reference/inference/elastic-infer-service.asciidoc @@ -0,0 +1,124 @@ +[[infer-service-elastic]] +=== Elastic {infer-cap} Service (EIS) + +.New API reference +[sidebar] +-- +For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs]. +-- + +Creates an {infer} endpoint to perform an {infer} task with the `elastic` service. + + +[discrete] +[[infer-service-elastic-api-request]] +==== {api-request-title} + + +`PUT /_inference//` + +[discrete] +[[infer-service-elastic-api-path-params]] +==== {api-path-parms-title} + + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=inference-id] + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=task-type] ++ +-- +Available task types: + +* `chat_completion`, +* `sparse_embedding`. +-- + +[NOTE] +==== +The `chat_completion` task type only supports streaming and only through the `_unified` API. + +include::inference-shared.asciidoc[tag=chat-completion-docs] +==== + +[discrete] +[[infer-service-elastic-api-request-body]] +==== {api-request-body-title} + + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + +`service`:: +(Required, string) +The type of service supported for the specified task type. In this case, +`elastic`. + +`service_settings`:: +(Required, object) +include::inference-shared.asciidoc[tag=service-settings] + +`model_id`::: +(Required, string) +The name of the model to use for the {infer} task. + +`rate_limit`::: +(Optional, object) +By default, the `elastic` service sets the number of requests allowed per minute to `1000` in case of `sparse_embedding` and `240` in case of `chat_completion`. +This helps to minimize the number of rate limit errors returned. +To modify this, set the `requests_per_minute` setting of this object in your service settings: ++ +-- +include::inference-shared.asciidoc[tag=request-per-minute-example] +-- + + +[discrete] +[[inference-example-elastic]] +==== Elastic {infer-cap} Service example + + +The following example shows how to create an {infer} endpoint called `elser-model-eis` to perform a `text_embedding` task type. + +[source,console] +------------------------------------------------------------ +PUT _inference/sparse_embedding/elser-model-eis +{ + "service": "elastic", + "service_settings": { + "model_name": "elser" + } +} + +------------------------------------------------------------ +// TEST[skip:TBD] + +The following example shows how to create an {infer} endpoint called `chat-completion-endpoint` to perform a `chat_completion` task type. + +[source,console] +------------------------------------------------------------ +PUT /_inference/chat_completion/chat-completion-endpoint +{ + "service": "elastic", + "service_settings": { + "model_id": "model-1" + } +} +------------------------------------------------------------ +// TEST[skip:TBD] \ No newline at end of file diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index 6c97f388788f7..aa1d54de60391 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -136,6 +136,7 @@ include::chat-completion-inference.asciidoc[] include::put-inference.asciidoc[] include::stream-inference.asciidoc[] include::update-inference.asciidoc[] +include::elastic-infer-service.asciidoc[] include::service-alibabacloud-ai-search.asciidoc[] include::service-amazon-bedrock.asciidoc[] include::service-anthropic.asciidoc[] diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc index 4e149667d6298..6e33619c11e59 100644 --- a/docs/reference/inference/put-inference.asciidoc +++ b/docs/reference/inference/put-inference.asciidoc @@ -59,6 +59,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a * Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources. ==== +You can create an {infer} endpoint that uses the <> to perform {infer} tasks as a service without the need of deploying a model in your environment. The following integrations are available through the {infer} API. You can find the available task types next to the integration name.