diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index b291b464be498..ddcff1abc7dce 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -34,6 +34,24 @@ Elastic –, then create an {infer} endpoint by the <>. Now use <> to perform <> on your data. + +[discrete] +[[default-enpoints]] +=== Default {infer} endpoints + +Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors. +The following list contains the default {infer} endpoints listed by `inference_id`: + +* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts) +* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts) + +Use the `inference_id` of the endpoint in a <> field definition or when creating an <>. +The API call will automatically download and deploy the model which might take a couple of minutes. +Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled. +For these models, the minimum number of allocations is `0`. +If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes. + + include::delete-inference.asciidoc[] include::get-inference.asciidoc[] include::post-inference.asciidoc[] diff --git a/docs/reference/inference/service-elasticsearch.asciidoc b/docs/reference/inference/service-elasticsearch.asciidoc index efa0c78b8356f..259779a12134d 100644 --- a/docs/reference/inference/service-elasticsearch.asciidoc +++ b/docs/reference/inference/service-elasticsearch.asciidoc @@ -1,12 +1,9 @@ [[infer-service-elasticsearch]] === Elasticsearch {infer} service -Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` -service. +Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` service. -NOTE: If you use the E5 model through the `elasticsearch` service, the API -request will automatically download and deploy the model if it isn't downloaded -yet. +NOTE: If you use the ELSER or the E5 model through the `elasticsearch` service, the API request will automatically download and deploy the model if it isn't downloaded yet. [discrete] @@ -56,6 +53,11 @@ These settings are specific to the `elasticsearch` service. (Optional, object) include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation] +`deployment_id`::: +(Optional, string) +The `deployment_id` of an existing trained model deployment. +When `deployment_id` is used the `model_id` is optional. + `enabled`:::: (Optional, Boolean) include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled] @@ -71,7 +73,7 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number] `model_id`::: (Required, string) The name of the model to use for the {infer} task. -It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already +It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5), a text embedding model already {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland]. `num_allocations`::: @@ -98,15 +100,44 @@ Returns the document instead of only the index. Defaults to `true`. ===== +[discrete] +[[inference-example-elasticsearch-elser]] +==== ELSER via the `elasticsearch` service + +The following example shows how to create an {infer} endpoint called `my-elser-model` to perform a `sparse_embedding` task type. + +The API request below will automatically download the ELSER model if it isn't already downloaded and then deploy the model. + +[source,console] +------------------------------------------------------------ +PUT _inference/sparse_embedding/my-elser-model +{ + "service": "elasticsearch", + "service_settings": { + "adaptive_allocations": { <1> + "enabled": true, + "min_number_of_allocations": 1, + "max_number_of_allocations": 10 + }, + "num_threads": 1, + "model_id": ".elser_model_2" <2> + } +} +------------------------------------------------------------ +// TEST[skip:TBD] +<1> Adaptive allocations will be enabled with the minimum of 1 and the maximum of 10 allocations. +<2> The `model_id` must be the ID of one of the built-in ELSER models. +Valid values are `.elser_model_2` and `.elser_model_2_linux-x86_64`. +For further details, refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation]. + + [discrete] [[inference-example-elasticsearch]] ==== E5 via the `elasticsearch` service -The following example shows how to create an {infer} endpoint called -`my-e5-model` to perform a `text_embedding` task type. +The following example shows how to create an {infer} endpoint called `my-e5-model` to perform a `text_embedding` task type. -The API request below will automatically download the E5 model if it isn't -already downloaded and then deploy the model. +The API request below will automatically download the E5 model if it isn't already downloaded and then deploy the model. [source,console] ------------------------------------------------------------ @@ -185,3 +216,46 @@ PUT _inference/text_embedding/my-e5-model } ------------------------------------------------------------ // TEST[skip:TBD] + + +[discrete] +[[inference-example-existing-deployment]] +==== Using an existing model deployment with the `elasticsearch` service + +The following example shows how to use an already existing model deployment when creating an {infer} endpoint. + +[source,console] +------------------------------------------------------------ +PUT _inference/sparse_embedding/use_existing_deployment +{ + "service": "elasticsearch", + "service_settings": { + "deployment_id": ".elser_model_2" <1> + } +} +------------------------------------------------------------ +// TEST[skip:TBD] +<1> The `deployment_id` of the already existing model deployment. + +The API response contains the `model_id`, and the threads and allocations settings from the model deployment: + +[source,console-result] +------------------------------------------------------------ +{ + "inference_id": "use_existing_deployment", + "task_type": "sparse_embedding", + "service": "elasticsearch", + "service_settings": { + "num_allocations": 2, + "num_threads": 1, + "model_id": ".elser_model_2", + "deployment_id": ".elser_model_2" + }, + "chunking_settings": { + "strategy": "sentence", + "max_chunk_size": 250, + "sentence_overlap": 1 + } +} +------------------------------------------------------------ +// NOTCONSOLE \ No newline at end of file diff --git a/docs/reference/inference/service-elser.asciidoc b/docs/reference/inference/service-elser.asciidoc index 6afc2a2e3ef65..521fab0375584 100644 --- a/docs/reference/inference/service-elser.asciidoc +++ b/docs/reference/inference/service-elser.asciidoc @@ -2,6 +2,7 @@ === ELSER {infer} service Creates an {infer} endpoint to perform an {infer} task with the `elser` service. +You can also deploy ELSER by using the <>. NOTE: The API request will automatically download and deploy the ELSER model if it isn't already downloaded. @@ -128,7 +129,7 @@ If using the Python client, you can set the `timeout` parameter to a higher valu [discrete] [[inference-example-elser-adaptive-allocation]] -==== Setting adaptive allocation for the ELSER service +==== Setting adaptive allocations for the ELSER service NOTE: For more information on how to optimize your ELSER endpoints, refer to {ml-docs}/ml-nlp-elser.html#elser-recommendations[the ELSER recommendations] section in the model documentation. To learn more about model autoscaling, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] page.