Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 68 additions & 17 deletions docs/reference/search/retriever.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,60 @@ retrievers) *only* the query element is allowed.
[[standard-retriever-example]]
==== Example

[source,js]
////
[source,console]
----
PUT /restaurants
{
"mappings": {
"properties": {
"region": { "type": "keyword" },
"year": { "type": "keyword" }
}
}
}

POST /restaurants/_bulk?refresh
{"index":{}}
{"region": "Austria", "year": "2019"}
{"index":{}}
{"region": "France", "year": "2019"}
{"index":{}}
{"region": "Austria", "year": "2020"}

PUT /my-embeddings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: It might be nicer to have a single index restaurants with the combined keyword and vector fields. This would make the examples a bit cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! I'll update :cat_frantically_typing:

{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"dims": 3
}
}
}
}

POST /my-embeddings/_bulk?refresh
{"index":{}}
{"vector": [10, 22, 77]}
{"index":{}}
{"vector": [10, 22, 78]}
{"index":{}}
{"vector": [10, 22, 79]}
{"index":{}}
{"vector": [10, 22, 80]}
----
// TESTSETUP

[source,console]
--------------------------------------------------
DELETE /restaurants
DELETE /my-embeddings
--------------------------------------------------
// TEARDOWN
////

[source,console]
----
GET /restaurants/_search
{
Expand Down Expand Up @@ -109,7 +162,6 @@ GET /restaurants/_search
}
}
----
// NOTCONSOLE
<1> Opens the `retriever` object.
<2> The `standard` retriever is used for definining traditional {es} queries.
<3> The entry point for defining the search query.
Expand Down Expand Up @@ -171,7 +223,7 @@ The parameters `query_vector` and `query_vector_builder` cannot be used together
[[knn-retriever-example]]
==== Example

[source,js]
[source,console]
----
GET my-embeddings/_search
{
Expand All @@ -185,8 +237,7 @@ GET my-embeddings/_search
}
}
----
// NOTCONSOLE

// TEST[continued]
<1> Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
<2> Specifies the field name that contains the vectors.
<3> The query vector against which document vectors are compared in the `knn` search.
Expand Down Expand Up @@ -223,9 +274,9 @@ the retriever tree.

A simple hybrid search example (lexical search + dense vector search) combining a `standard` retriever with a `knn` retriever using RRF:

[source,js]
[source,console]
----
GET /restaurants/_search
GET /restaurants,my-embeddings/_search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my nitpick in tests, I'd like to see us continue to query a single index or alias in this example

{
"retriever": {
"rrf": { <1>
Expand All @@ -234,7 +285,7 @@ GET /restaurants/_search
"standard": { <3>
"query": {
"multi_match": {
"query": "San Francisco",
"query": "Austria",
"fields": [
"city",
"region"
Expand All @@ -258,7 +309,7 @@ GET /restaurants/_search
}
}
----
// NOTCONSOLE
// TEST[continued]
<1> Defines a retriever tree with an RRF retriever.
<2> The sub-retriever array.
<3> The first sub-retriever is a `standard` retriever.
Expand All @@ -272,7 +323,7 @@ GET /restaurants/_search

A more complex hybrid search example (lexical search + ELSER sparse vector search + dense vector search) using RRF:

[source,js]
[source,console]
----
GET movies/_search
{
Expand Down Expand Up @@ -316,7 +367,7 @@ GET movies/_search
}
}
----
// NOTCONSOLE
// TEST[skip:uses ELSER]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice comment explanations


[[text-similarity-reranker-retriever]]
==== Text Similarity Re-ranker Retriever
Expand Down Expand Up @@ -390,7 +441,7 @@ A text similarity re-ranker retriever is a compound retriever. Child retrievers
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminate the need to generate and store embeddings for all indexed documents.
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.

[source,js]
[source,console]
----
GET /index/_search
{
Expand All @@ -414,7 +465,7 @@ GET /index/_search
}
}
----
// NOTCONSOLE
// TEST[skip:uses ML]

[discrete]
[[text-similarity-reranker-retriever-example-eland]]
Expand Down Expand Up @@ -452,7 +503,7 @@ eland_import_hub_model \
+
. Create an inference endpoint for the `rerank` task
+
[source,js]
[source,console]
----
PUT _inference/rerank/my-msmarco-minilm-model
{
Expand All @@ -464,11 +515,11 @@ PUT _inference/rerank/my-msmarco-minilm-model
}
}
----
// NOTCONSOLE
// TEST[skip:uses ML]
+
. Define a `text_similarity_rerank` retriever.
+
[source,js]
[source,console]
----
POST movies/_search
{
Expand All @@ -490,7 +541,7 @@ POST movies/_search
}
}
----
// NOTCONSOLE
// TEST[skip:uses ML]
+
This retriever uses a standard `match` query to search the `movie` index for films tagged with the genre "drama".
It then re-ranks the results based on semantic similarity to the text in the `inference_text` parameter, using the model we uploaded to {es}.
Expand Down