diff --git a/docs/reference/inference/elastic-infer-service.asciidoc b/docs/reference/inference/elastic-infer-service.asciidoc deleted file mode 100644 index 291b82798fffc..0000000000000 --- a/docs/reference/inference/elastic-infer-service.asciidoc +++ /dev/null @@ -1,108 +0,0 @@ -[[infer-service-elastic]] -=== Elastic {infer-cap} Service (EIS) - -.New API reference -[sidebar] --- -For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs]. --- - -Creates an {infer} endpoint to perform an {infer} task with the `elastic` service. - - -[discrete] -[[infer-service-elastic-api-request]] -==== {api-request-title} - - -`PUT /_inference//` - -[discrete] -[[infer-service-elastic-api-path-params]] -==== {api-path-parms-title} - - -``:: -(Required, string) -include::inference-shared.asciidoc[tag=inference-id] - -``:: -(Required, string) -include::inference-shared.asciidoc[tag=task-type] -+ --- -Available task types: - -* `chat_completion` --- - -[NOTE] -==== -The `chat_completion` task type only supports streaming and only through the `_stream` API. - -include::inference-shared.asciidoc[tag=chat-completion-docs] -==== - -[discrete] -[[infer-service-elastic-api-request-body]] -==== {api-request-body-title} - - - -`max_chunk_size`::: -(Optional, integer) -include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] - -`overlap`::: -(Optional, integer) -include::inference-shared.asciidoc[tag=chunking-settings-overlap] - -`sentence_overlap`::: -(Optional, integer) -include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] - -`strategy`::: -(Optional, string) -include::inference-shared.asciidoc[tag=chunking-settings-strategy] - -`service`:: -(Required, string) -The type of service supported for the specified task type. In this case, -`elastic`. - -`service_settings`:: -(Required, object) -include::inference-shared.asciidoc[tag=service-settings] - -`model_id`::: -(Required, string) -The name of the model to use for the {infer} task. - -`rate_limit`::: -(Optional, object) -By default, the `elastic` service sets the number of requests allowed per minute to `1000` in case of `sparse_embedding` and `240` in case of `chat_completion`. -This helps to minimize the number of rate limit errors returned. -To modify this, set the `requests_per_minute` setting of this object in your service settings: -+ --- -include::inference-shared.asciidoc[tag=request-per-minute-example] --- - - -[discrete] -[[inference-example-elastic]] -==== Elastic {infer-cap} Service example - -The following example shows how to create an {infer} endpoint called `chat-completion-endpoint` to perform a `chat_completion` task type. - -[source,console] ------------------------------------------------------------- -PUT /_inference/chat_completion/chat-completion-endpoint -{ - "service": "elastic", - "service_settings": { - "model_id": "rainbow-sprinkles" - } -} ------------------------------------------------------------- -// TEST[skip:TBD] \ No newline at end of file diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index 67b45aac678e6..9c5218324f229 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -138,7 +138,6 @@ include::chat-completion-inference.asciidoc[] include::put-inference.asciidoc[] include::stream-inference.asciidoc[] include::update-inference.asciidoc[] -include::elastic-infer-service.asciidoc[] include::service-alibabacloud-ai-search.asciidoc[] include::service-amazon-bedrock.asciidoc[] include::service-anthropic.asciidoc[] diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc index a9a7c9bb32a5d..73b036ef6880b 100644 --- a/docs/reference/inference/put-inference.asciidoc +++ b/docs/reference/inference/put-inference.asciidoc @@ -59,8 +59,6 @@ The create {infer} API enables you to create an {infer} endpoint and configure a * Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources. ==== -You can create an {infer} endpoint that uses the <> to perform {infer} tasks as a service without the need of deploying a model in your environment. - The following integrations are available through the {infer} API. You can find the available task types next to the integration name. Click the links to review the configuration details of the integrations: