Skip to content

Commit c82d2dc

Browse files
committed
[DOCS] Add Elastic Rerank usage docs (#117625)
1 parent a7e31f1 commit c82d2dc

File tree

3 files changed

+121
-23
lines changed

3 files changed

+121
-23
lines changed

docs/reference/inference/service-elasticsearch.asciidoc

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,15 @@ include::inference-shared.asciidoc[tag=service-settings]
6969
These settings are specific to the `elasticsearch` service.
7070
--
7171

72-
`adaptive_allocations`:::
73-
(Optional, object)
74-
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
75-
7672
`deployment_id`:::
7773
(Optional, string)
7874
The `deployment_id` of an existing trained model deployment.
7975
When `deployment_id` is used the `model_id` is optional.
8076

77+
`adaptive_allocations`:::
78+
(Optional, object)
79+
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
80+
8181
`enabled`::::
8282
(Optional, Boolean)
8383
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
@@ -119,7 +119,6 @@ include::inference-shared.asciidoc[tag=task-settings]
119119
Returns the document instead of only the index. Defaults to `true`.
120120
=====
121121

122-
123122
[discrete]
124123
[[inference-example-elasticsearch-elser]]
125124
==== ELSER via the `elasticsearch` service
@@ -137,7 +136,7 @@ PUT _inference/sparse_embedding/my-elser-model
137136
"adaptive_allocations": { <1>
138137
"enabled": true,
139138
"min_number_of_allocations": 1,
140-
"max_number_of_allocations": 10
139+
"max_number_of_allocations": 4
141140
},
142141
"num_threads": 1,
143142
"model_id": ".elser_model_2" <2>
@@ -150,6 +149,34 @@ PUT _inference/sparse_embedding/my-elser-model
150149
Valid values are `.elser_model_2` and `.elser_model_2_linux-x86_64`.
151150
For further details, refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation].
152151

152+
[discrete]
153+
[[inference-example-elastic-reranker]]
154+
==== Elastic Rerank via the `elasticsearch` service
155+
156+
The following example shows how to create an {infer} endpoint called `my-elastic-rerank` to perform a `rerank` task type using the built-in Elastic Rerank cross-encoder model.
157+
158+
The API request below will automatically download the Elastic Rerank model if it isn't already downloaded and then deploy the model.
159+
Once deployed, the model can be used for semantic re-ranking with a <<text-similarity-reranker-retriever-example-elastic-rerank,`text_similarity_reranker` retriever>>.
160+
161+
[source,console]
162+
------------------------------------------------------------
163+
PUT _inference/rerank/my-elastic-rerank
164+
{
165+
"service": "elasticsearch",
166+
"service_settings": {
167+
"model_id": ".rerank-v1", <1>
168+
"num_threads": 1,
169+
"adaptive_allocations": { <2>
170+
"enabled": true,
171+
"min_number_of_allocations": 1,
172+
"max_number_of_allocations": 4
173+
}
174+
}
175+
}
176+
------------------------------------------------------------
177+
// TEST[skip:TBD]
178+
<1> The `model_id` must be the ID of the built-in Elastic Rerank model: `.rerank-v1`.
179+
<2> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
153180

154181
[discrete]
155182
[[inference-example-elasticsearch]]
@@ -186,7 +213,7 @@ If using the Python client, you can set the `timeout` parameter to a higher valu
186213

187214
[discrete]
188215
[[inference-example-eland]]
189-
==== Models uploaded by Eland via the elasticsearch service
216+
==== Models uploaded by Eland via the `elasticsearch` service
190217

191218
The following example shows how to create an {infer} endpoint called
192219
`my-msmarco-minilm-model` to perform a `text_embedding` task type.

docs/reference/reranking/semantic-reranking.asciidoc

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,16 @@ In {es}, semantic re-rankers are implemented using the {es} <<inference-apis,Inf
8585

8686
To use semantic re-ranking in {es}, you need to:
8787

88-
. *Choose a re-ranking model*.
89-
Currently you can:
90-
91-
** Integrate directly with the <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type
92-
** Integrate directly with the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> using the `rerank` task type
93-
** Upload a model to {es} from Hugging Face with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland]. You'll need to use the `text_similarity` NLP task type when loading the model using Eland. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third party text similarity models supported by {es} for semantic re-ranking.
94-
*** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type
95-
. *Create a `rerank` task using the <<put-inference-api,{es} Inference API>>*.
88+
. *Select and configure a re-ranking model*.
89+
You have the following options:
90+
.. Use the <<inference-example-elastic-reranker,Elastic Rerank>> cross-encoder model via the inference API's {es} service.
91+
.. Use the <<infer-service-cohere,Cohere Rerank inference endpoint>> to create a `rerank` endpoint.
92+
.. Use the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> to create a `rerank` endpoint.
93+
.. Upload a model to {es} from Hugging Face with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland]. You'll need to use the `text_similarity` NLP task type when loading the model using Eland. Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` endpoint type.
94+
+
95+
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third party text similarity models supported by {es} for semantic re-ranking.
96+
97+
. *Create a `rerank` endpoint using the <<put-inference-api,{es} Inference API>>*.
9698
The Inference API creates an inference endpoint and configures your chosen machine learning model to perform the re-ranking task.
9799
. *Define a `text_similarity_reranker` retriever in your search request*.
98100
The retriever syntax makes it simple to configure both the retrieval and re-ranking of search results in a single API call.
@@ -117,7 +119,7 @@ POST _search
117119
}
118120
},
119121
"field": "text",
120-
"inference_id": "my-cohere-rerank-model",
122+
"inference_id": "elastic-rerank",
121123
"inference_text": "How often does the moon hide the sun?",
122124
"rank_window_size": 100,
123125
"min_score": 0.5

docs/reference/search/retriever.asciidoc

Lines changed: 76 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ This allows for complex behavior to be depicted in a tree-like structure, called
1111
[TIP]
1212
====
1313
Refer to <<retrievers-overview>> for a high level overview of the retrievers abstraction.
14+
Refer to <<retrievers-examples, Retrievers examples>> for additional examples.
1415
====
1516

1617
The following retrievers are available:
@@ -382,16 +383,17 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking
382383

383384
===== Prerequisites
384385

385-
To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>.
386-
The `rerank` task should be set up with a machine learning model that can compute text similarity.
386+
To use `text_similarity_reranker` you must first set up an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
387+
The endpoint should be set up with a machine learning model that can compute text similarity.
387388
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
388389

389-
Currently you can:
390+
You have the following options:
390391

391-
* Integrate directly with the <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type
392-
* Integrate directly with the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> using the `rerank` task type
392+
* Use the the built-in <<inference-example-elastic-reranker,Elastic Rerank>> cross-encoder model via the inference API's {es} service.
393+
* Use the <<infer-service-cohere,Cohere Rerank inference endpoint>> with the `rerank` task type.
394+
* Use the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> with the `rerank` task type.
393395
* Upload a model to {es} with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland] using the `text_similarity` NLP task type.
394-
** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type
396+
** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type.
395397
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.
396398

397399
===== Parameters
@@ -436,13 +438,70 @@ Note that score calculations vary depending on the model used.
436438
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>.
437439
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
438440

441+
[discrete]
442+
[[text-similarity-reranker-retriever-example-elastic-rerank]]
443+
==== Example: Elastic Rerank
444+
445+
This examples demonstrates how to deploy the Elastic Rerank model and use it to re-rank search results using the `text_similarity_reranker` retriever.
446+
447+
Follow these steps:
448+
449+
. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
450+
+
451+
[source,console]
452+
----
453+
PUT _inference/rerank/my-elastic-rerank
454+
{
455+
"service": "elasticsearch",
456+
"service_settings": {
457+
"model_id": ".rerank-v1",
458+
"num_threads": 1,
459+
"adaptive_allocations": { <1>
460+
"enabled": true,
461+
"min_number_of_allocations": 1,
462+
"max_number_of_allocations": 10
463+
}
464+
}
465+
}
466+
----
467+
// TEST[skip:uses ML]
468+
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
469+
+
470+
. Define a `text_similarity_rerank` retriever:
471+
+
472+
[source,console]
473+
----
474+
POST _search
475+
{
476+
"retriever": {
477+
"text_similarity_reranker": {
478+
"retriever": {
479+
"standard": {
480+
"query": {
481+
"match": {
482+
"text": "How often does the moon hide the sun?"
483+
}
484+
}
485+
}
486+
},
487+
"field": "text",
488+
"inference_id": "my-elastic-rerank",
489+
"inference_text": "How often does the moon hide the sun?",
490+
"rank_window_size": 100,
491+
"min_score": 0.5
492+
}
493+
}
494+
}
495+
----
496+
// TEST[skip:uses ML]
497+
439498
[discrete]
440499
[[text-similarity-reranker-retriever-example-cohere]]
441500
==== Example: Cohere Rerank
442501

443502
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API.
444503
This approach eliminates the need to generate and store embeddings for all indexed documents.
445-
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.
504+
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> that is set up for the `rerank` task type.
446505

447506
[source,console]
448507
----
@@ -680,6 +739,12 @@ GET movies/_search
680739
<1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
681740
<2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.
682741

742+
[discrete]
743+
[[retriever-common-parameters]]
744+
=== Common usage guidelines
745+
746+
[discrete]
747+
[[retriever-size-pagination]]
683748
==== Using `from` and `size` with a retriever tree
684749

685750
The <<search-from-param, `from`>> and <<search-size-param, `size`>>
@@ -688,12 +753,16 @@ parameters are provided globally as part of the general
688753
They are applied to all retrievers in a retriever tree, unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`.
689754
Though, the final search hits are always limited to `size`.
690755

756+
[discrete]
757+
[[retriever-aggregations]]
691758
==== Using aggregations with a retriever tree
692759

693760
<<search-aggregations, Aggregations>> are globally specified as part of a search request.
694761
The query used for an aggregation is the combination of all leaf retrievers as `should`
695762
clauses in a <<query-dsl-bool-query, boolean query>>.
696763

764+
[discrete]
765+
[[retriever-restrictions]]
697766
==== Restrictions on search parameters when specifying a retriever
698767

699768
When a retriever is specified as part of a search, the following elements are not allowed at the top-level.

0 commit comments

Comments
 (0)