-
Notifications
You must be signed in to change notification settings - Fork 159
Adds nested knn search example #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -889,6 +889,169 @@ Now the result will contain the nearest found paragraph when searching. | |
} | ||
``` | ||
|
||
### Customize nested kNN search | ||
|
||
You can combine nested kNN search with top-level dense vector fields, filters, and `inner_hits` to support more advanced use cases. | ||
|
||
|
||
This approach is helpful when you: | ||
kosabogi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
- You use your own model to generate vectors and don’t rely on the `semantic_text` field provided by Elastic’s semantic search capability. | ||
kosabogi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
- Want to search both document titles and the smaller sections inside, like paragraphs. | ||
- Need to return not just matching documents, but also the exact passages that matched. | ||
|
||
#### Create the index mapping | ||
This example creates an index that stores a vector at the top level for the document title and multiple vectors inside a nested field for individual paragraphs. | ||
|
||
```console | ||
PUT multi_level_vectors | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"title_vector": { | ||
"type": "dense_vector", | ||
"dims": 2, | ||
"index_options": { | ||
"type": "hnsw" | ||
} | ||
}, | ||
"paragraphs": { | ||
"type": "nested", | ||
"properties": { | ||
"text": { | ||
"type": "text" | ||
}, | ||
"vector": { | ||
"type": "dense_vector", | ||
"dims": 2, | ||
"index_options": { | ||
"type": "hnsw" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
#### Index the documents | ||
This step adds example documents with title and paragraph vectors. | ||
|
||
```console | ||
POST _bulk | ||
{ "index": { "_index": "multi_level_vectors", "_id": "1" } } | ||
{ "title_vector": [0.5, 0.4], "paragraphs": [ { "text": "First paragraph", "vector": [0.5, 0.4] }, { "text": "Second paragraph", "vector": [0.3, 0.8] } ] } | ||
{ "index": { "_index": "multi_level_vectors", "_id": "2" } } | ||
{ "title_vector": [0.1, 0.9], "paragraphs": [ { "text": "Another one", "vector": [0.1, 0.9] } ] } | ||
``` | ||
|
||
#### Run the search query | ||
This example searches for documents with relevant paragraph vectors. The `inner_hits` section returns the most relevant paragraphs from each matching document. | ||
|
||
```console | ||
POST multi_level_vectors/_search | ||
{ | ||
"_source": false, | ||
"fields": ["title_vector"], | ||
"knn": { | ||
"field": "paragraphs.vector", | ||
"query_vector": [0.5, 0.4], | ||
"k": 2, | ||
"num_candidates": 10, | ||
"inner_hits": { | ||
"size": 2, | ||
"name": "top_passages", | ||
"_source": false, | ||
"fields": ["paragraphs.text"] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The `inner_hits` block returns the most relevant paragraphs within each top-level document. Use the `size` field to control how many matches you retrieve. If your query includes multiple kNN clauses, use the `name` field to avoid naming conflicts in the response. | ||
|
||
```json | ||
{ | ||
"took": 58, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { "value": 2, "relation": "eq" }, <1> | ||
"max_score": 1, | ||
"hits": [ | ||
{ | ||
"_index": "multi_level_vectors", | ||
"_id": "1", | ||
"_score": 1, <2> | ||
"fields": { | ||
"title_vector": [0.5, 0.4] | ||
}, | ||
"inner_hits": { | ||
"top_passages": { | ||
"hits": { | ||
"total": { "value": 2, "relation": "eq" }, | ||
"max_score": 1, | ||
"hits": [ | ||
{ | ||
"_nested": { "field": "paragraphs", "offset": 0 }, <3> | ||
"_score": 1, | ||
"fields": { | ||
"paragraphs": [ { "text": ["First paragraph"] } ] <4> | ||
} | ||
}, | ||
{ | ||
"_nested": { "field": "paragraphs", "offset": 1 }, | ||
"_score": 0.92955077, | ||
"fields": { | ||
"paragraphs": [ { "text": ["Second paragraph"] } ] | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"_index": "multi_level_vectors", | ||
"_id": "2", | ||
"_score": 0.8535534, | ||
"fields": { | ||
"title_vector": [0.1, 0.9] | ||
}, | ||
"inner_hits": { | ||
"top_passages": { | ||
"hits": { | ||
"total": { "value": 1, "relation": "eq" }, | ||
"max_score": 0.8535534, | ||
"hits": [ | ||
{ | ||
"_nested": { "field": "paragraphs", "offset": 0 }, | ||
"_score": 0.8535534, | ||
"fields": { | ||
"paragraphs": [ { "text": ["Another one"] } ] | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
1. Two top-level documents matched the query. | ||
2. The score of the document, based on the best matching paragraph. | ||
3. Inner hits show which nested paragraph matched best. | ||
4. The actual matching paragraph text. | ||
|
||
|
||
### Limitations for approximate kNN search [approximate-knn-limitations] | ||
|
||
* When using kNN search in [{{ccs}}](../../../solutions/search/cross-cluster-search.md), the [`ccs_minimize_roundtrips`](../../../solutions/search/cross-cluster-search.md#ccs-min-roundtrips) option is not supported. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be better to make this guide its own subpage. This is quite complex and may make the page significantly longer – something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea for a follow-up to rethink how we can break down this entire page into more manageable sub-pages —cc @kderusso
I'm pretty sure it just organically became one long append-only page because we didn't have sufficient nesting capacity in the old asciidoc system :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions! I’ve created an issue to address this later and improve the readability of the page.