You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A retriever is a specification to describe top documents returned from a
5
-
search. A retriever replaces other elements of the <<search-search, search API>>
4
+
A retriever is a specification to describe top documents returned from a search.
5
+
A retriever replaces other elements of the <<search-search, search API>>
6
6
that also return top documents such as <<query-dsl, `query`>> and
7
-
<<search-api-knn, `knn`>>. A retriever may have child retrievers where a
8
-
retriever with two or more children is considered a compound retriever. This
9
-
allows for complex behavior to be depicted in a tree-like structure, called
10
-
the retriever tree, to better clarify the order of operations that occur
11
-
during a search.
7
+
<<search-api-knn, `knn`>>.
8
+
A retriever may have child retrievers where a retriever with two or more children is considered a compound retriever.
9
+
This allows for complex behavior to be depicted in a tree-like structure, called the retriever tree, to better clarify the order of operations that occur during a search.
12
10
13
11
[TIP]
14
12
====
@@ -29,6 +27,9 @@ A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciproca
29
27
`text_similarity_reranker`::
30
28
A <<text-similarity-reranker-retriever, retriever>> that enhances search results by re-ranking documents based on semantic similarity to a specified inference text, using a machine learning model.
31
29
30
+
`rule`::
31
+
A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin or exclude documents for specific queries.
32
+
32
33
[[standard-retriever]]
33
34
==== Standard Retriever
34
35
@@ -44,8 +45,7 @@ Defines a query to retrieve a set of top documents.
44
45
`filter`::
45
46
(Optional, <<query-dsl, query object or list of query objects>>)
46
47
+
47
-
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever
48
-
where all documents must match this query but do not contribute to the score.
48
+
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever where all documents must match this query but do not contribute to the score.
@@ -72,8 +71,7 @@ Collapses the top documents by a specified key into a single top document per ke
72
71
73
72
===== Restrictions
74
73
75
-
When a retriever tree contains a compound retriever (a retriever with two or more child
76
-
retrievers) the <<search-after, search after>> parameter is not supported.
74
+
When a retriever tree contains a compound retriever (a retriever with two or more child retrievers) the <<search-after, search after>> parameter is not supported.
77
75
78
76
[discrete]
79
77
[[standard-retriever-example]]
@@ -143,11 +141,13 @@ GET /restaurants/_search
143
141
}
144
142
}
145
143
----
144
+
146
145
<1> Opens the `retriever` object.
147
146
<2> The `standard` retriever is used for defining traditional {es} queries.
148
147
<3> The entry point for defining the search query.
149
148
<4> The `bool` object allows for combining multiple query clauses logically.
150
-
<5> The `should` array indicates conditions under which a document will match. Documents matching these conditions will increase their relevancy score.
149
+
<5> The `should` array indicates conditions under which a document will match.
150
+
Documents matching these conditions will increase their relevancy score.
151
151
<6> The `match` object finds documents where the `region` field contains the word "Austria."
152
152
<7> The `filter` array provides filtering conditions that must be met but do not contribute to the relevancy score.
153
153
<8> The `term` object is used for exact matches, in this case, filtering documents by the `year` field.
@@ -178,8 +178,8 @@ Defines a <<knn-semantic-search, model>> to build a query vector.
178
178
`k`::
179
179
(Required, integer)
180
180
+
181
-
Number of nearest neighbors to return as top hits. This value must be fewer than
182
-
or equal to `num_candidates`.
181
+
Number of nearest neighbors to return as top hits.
182
+
This value must be fewer than or equal to `num_candidates`.
183
183
184
184
`num_candidates`::
185
185
(Required, integer)
@@ -222,16 +222,15 @@ GET /restaurants/_search
222
222
<1> Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
223
223
<2> Specifies the field name that contains the vectors.
224
224
<3> The query vector against which document vectors are compared in the `knn` search.
225
-
<4> The number of nearest neighbors to return as top hits. This value must be fewer than or equal to `num_candidates`.
225
+
<4> The number of nearest neighbors to return as top hits.
226
+
This value must be fewer than or equal to `num_candidates`.
226
227
<5> The size of the initial candidate set from which the final `k` nearest neighbors are selected.
227
228
228
229
[[rrf-retriever]]
229
230
==== RRF Retriever
230
231
231
-
An <<rrf, RRF>> retriever returns top documents based on the RRF formula,
232
-
equally weighting two or more child retrievers.
233
-
Reciprocal rank fusion (RRF) is a method for combining multiple result
234
-
sets with different relevance indicators into a single result set.
232
+
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
233
+
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
235
234
236
235
===== Parameters
237
236
@@ -357,7 +356,8 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking
357
356
===== Prerequisites
358
357
359
358
To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>.
360
-
The `rerank` task should be set up with a machine learning model that can compute text similarity. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
359
+
The `rerank` task should be set up with a machine learning model that can compute text similarity.
360
+
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
361
361
362
362
Currently you can:
363
363
@@ -368,6 +368,7 @@ Currently you can:
368
368
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.
369
369
370
370
===== Parameters
371
+
371
372
`retriever`::
372
373
(Required, <<retriever, retriever>>)
373
374
+
@@ -376,7 +377,8 @@ The child retriever that generates the initial set of top documents to be re-ran
376
377
`field`::
377
378
(Required, `string`)
378
379
+
379
-
The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the `inferenceText`.
380
+
The document field to be used for text similarity comparisons.
381
+
This field should contain the text that will be evaluated against the `inferenceText`.
380
382
381
383
`inference_id`::
382
384
(Required, `string`)
@@ -391,25 +393,28 @@ The text snippet used as the basis for similarity comparison.
391
393
`rank_window_size`::
392
394
(Optional, `int`)
393
395
+
394
-
The number of top documents to consider in the re-ranking process. Defaults to `10`.
396
+
The number of top documents to consider in the re-ranking process.
397
+
Defaults to `10`.
395
398
396
399
`min_score`::
397
400
(Optional, `float`)
398
401
+
399
-
Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.
402
+
Sets a minimum threshold score for including documents in the re-ranked results.
403
+
Documents with similarity scores below this threshold will be excluded.
404
+
Note that score calculations vary depending on the model used.
400
405
401
406
`filter`::
402
407
(Optional, <<query-dsl, query object or list of query objects>>)
403
408
+
404
409
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>.
405
-
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction
406
-
with the filter defined in the child retriever.
410
+
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminate the need to generate and store embeddings for all indexed documents.
416
+
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API.
417
+
This approach eliminate the need to generate and store embeddings for all indexed documents.
413
418
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.
414
419
415
420
[source,console]
@@ -459,7 +464,9 @@ Follow these steps to load the model and create a semantic re-ranker.
459
464
python -m pip install eland[pytorch]
460
465
----
461
466
+
462
-
. Upload the model to {es} using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
467
+
. Upload the model to {es} using Eland.
468
+
This example assumes you have an Elastic Cloud deployment and an API key.
469
+
Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
463
470
+
464
471
[source,sh]
465
472
----
@@ -517,14 +524,137 @@ POST movies/_search
517
524
This retriever uses a standard `match` query to search the `movie` index for films tagged with the genre "drama".
518
525
It then re-ranks the results based on semantic similarity to the text in the `inference_text` parameter, using the model we uploaded to {es}.
519
526
527
+
[[rule-retriever]]
528
+
==== Query Rules Retriever
529
+
530
+
The `rule` retriever offers users fine-grained control over the search results by applying contextual <<query-rules>> to pin or exclude documents for specific queries.
531
+
This retriever performs similar functionality to the <<query-dsl-rule-query>> but works out of the box with other retrievers including reranking retrievers like <<text-similarity-reranker-retriever, text_similarity_reranker>> and <<rrf-retriever, rrf>>.
532
+
533
+
===== Prerequisites
534
+
535
+
To use the `rule` retriever you must first create one or more query rulesets using the <<query-rules-apis, query rules management APIs>>.
536
+
537
+
===== Parameters
538
+
539
+
`retriever`::
540
+
(Required, <<retriever, retriever>>)
541
+
+
542
+
The child retriever that returns the results we will apply query rules on top of.
543
+
This can be a standalone retriever such as the <<standard-retriever, standard>> or <<knn-retriever, knn>> retriever, or it can be a compound retriever.
544
+
545
+
`ruleset_ids`::
546
+
(Required, `array`)
547
+
+
548
+
An array of one or more unique <<query-rules-apis, query ruleset>> ID with query-based rules to match and apply as applicable.
549
+
Rulesets and their associated rules are evaluated in the order in which they are specified in the query and ruleset.
550
+
The maximum number of rulesets to specify is 10.
551
+
552
+
`match_criteria`::
553
+
(Required, `object`)
554
+
+
555
+
Defines the match criteria to apply to rules in the given query ruleset.
556
+
Match criteria should match the keys defined in the `criteria.metadata` field of the rule.
557
+
558
+
`rank_window_size`::
559
+
(Optional, `int`)
560
+
+
561
+
The number of top documents to return from the `rule` retriever.
562
+
Defaults to `10`.
563
+
564
+
[discrete]
565
+
[[rule-retriever-example]]
566
+
==== Example: Rule retriever
567
+
568
+
This example shows the rule retriever executed without any additional retrievers.
569
+
It runs the query defined by the `retriever` and applies the rules from `my-ruleset` on top of the returned results.
570
+
571
+
[source,console]
572
+
----
573
+
GET my-index/_search
574
+
{
575
+
"retriever": {
576
+
"rule": {
577
+
"match_criteria": {
578
+
"query_string": "pugs"
579
+
},
580
+
"ruleset_ids": [
581
+
"my-ruleset"
582
+
],
583
+
"retriever": {
584
+
"standard": {
585
+
"query": {
586
+
"query_string": {
587
+
"query": "pugs"
588
+
}
589
+
}
590
+
}
591
+
}
592
+
}
593
+
}
594
+
}
595
+
----
596
+
597
+
[discrete]
598
+
[[rule-retriever-example-rrf]]
599
+
==== Example: Rule retriever combined with RRF
600
+
601
+
This example shows how to combine the `rule` retriever with other rerank retrievers such as <<rrf-retriever, rrf>> or <<text-similarity-reranker-retriever, text_similarity_reranker>>.
602
+
603
+
[WARNING]
604
+
====
605
+
The `rule` retriever will apply rules to any documents returned from its defined `retriever` or any of its sub-retrievers.
606
+
This means that for the best results, the `rule` retriever should be the outermost defined retriever.
607
+
Nesting a `rule` retriever as a sub-retriever under a reranker such as `rrf` or `text_similarity_reranker` may not produce the expected results.
608
+
====
609
+
610
+
[source,console]
611
+
----
612
+
GET my-index/_search
613
+
{
614
+
"retriever": {
615
+
"rule": {
616
+
"match_criteria": {
617
+
"query_string": "pugs"
618
+
},
619
+
"ruleset_ids": [
620
+
"my-ruleset"
621
+
],
622
+
"retriever": {
623
+
"rrf": {
624
+
"retrievers": [
625
+
{
626
+
"standard": {
627
+
"query": {
628
+
"query_string": {
629
+
"query": "beagles"
630
+
}
631
+
}
632
+
}
633
+
},
634
+
{
635
+
"standard": {
636
+
"query": {
637
+
"query_string": {
638
+
"query": "chihuahuas"
639
+
}
640
+
}
641
+
}
642
+
}
643
+
]
644
+
}
645
+
}
646
+
}
647
+
}
648
+
}
649
+
----
650
+
520
651
==== Using `from` and `size` with a retriever tree
521
652
522
653
The <<search-from-param, `from`>> and <<search-size-param, `size`>>
523
654
parameters are provided globally as part of the general
524
-
<<search-search, search API>>. They are applied to all retrievers in a
525
-
retriever tree unless a specific retriever overrides the `size` parameter
526
-
using a different parameter such as `rank_window_size`. Though, the final
527
-
search hits are always limited to `size`.
655
+
<<search-search, search API>>.
656
+
They are applied to all retrievers in a retriever tree unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`.
657
+
Though, the final search hits are always limited to `size`.
528
658
529
659
==== Using aggregations with a retriever tree
530
660
@@ -534,12 +664,12 @@ clauses in a <<query-dsl-bool-query, boolean query>>.
534
664
535
665
==== Restrictions on search parameters when specifying a retriever
536
666
537
-
When a retriever is specified as part of a search the following elements are not allowed
538
-
at the top-level and instead are only allowed as elements of specific retrievers:
667
+
When a retriever is specified as part of a search the following elements are not allowed at the top-level and instead are only allowed as elements of specific retrievers:
0 commit comments