Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions solutions/search/vector/knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -889,6 +889,169 @@ Now the result will contain the nearest found paragraph when searching.
}
```

### Customize nested kNN search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be better to make this guide its own subpage. This is quite complex and may make the page significantly longer – something to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea for a follow-up to rethink how we can break down this entire page into more manageable sub-pages —cc @kderusso

I'm pretty sure it just organically became one long append-only page because we didn't have sufficient nesting capacity in the old asciidoc system :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions! I’ve created an issue to address this later and improve the readability of the page.


You can combine nested kNN search with top-level dense vector fields, filters, and `inner_hits` to support more advanced use cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the ticket closely, I wonder if we could simplify this a bit. I think the use case of "create both at once" is interesting but my take on reading this ticket is more, "we have examples of how to create regular fields but there are no good examples in the same document on how to use nested fields." It feels like only examples using nested would provide more simple examples that could be used as building blocks. Since most people who are interested in using nested fields are doing so for chunking purposes, it may make their use case a bit more clear. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I totally agree with your take! I updated the PR to focus only on the nested use case and removed the mixed example to keep it simpler and more relevant. What do you think about it?


This approach is helpful when you:

- You use your own model to generate vectors and don’t rely on the `semantic_text` field provided by Elastic’s semantic search capability.
- Want to search both document titles and the smaller sections inside, like paragraphs.
- Need to return not just matching documents, but also the exact passages that matched.

#### Create the index mapping
This example creates an index that stores a vector at the top level for the document title and multiple vectors inside a nested field for individual paragraphs.

```console
PUT multi_level_vectors
{
"mappings": {
"properties": {
"title_vector": {
"type": "dense_vector",
"dims": 2,
"index_options": {
"type": "hnsw"
}
},
"paragraphs": {
"type": "nested",
"properties": {
"text": {
"type": "text"
},
"vector": {
"type": "dense_vector",
"dims": 2,
"index_options": {
"type": "hnsw"
}
}
}
}
}
}
}
```

#### Index the documents
This step adds example documents with title and paragraph vectors.

```console
POST _bulk
{ "index": { "_index": "multi_level_vectors", "_id": "1" } }
{ "title_vector": [0.5, 0.4], "paragraphs": [ { "text": "First paragraph", "vector": [0.5, 0.4] }, { "text": "Second paragraph", "vector": [0.3, 0.8] } ] }
{ "index": { "_index": "multi_level_vectors", "_id": "2" } }
{ "title_vector": [0.1, 0.9], "paragraphs": [ { "text": "Another one", "vector": [0.1, 0.9] } ] }
```

#### Run the search query
This example searches for documents with relevant paragraph vectors. The `inner_hits` section returns the most relevant paragraphs from each matching document.

```console
POST multi_level_vectors/_search
{
"_source": false,
"fields": ["title_vector"],
"knn": {
"field": "paragraphs.vector",
"query_vector": [0.5, 0.4],
"k": 2,
"num_candidates": 10,
"inner_hits": {
"size": 2,
"name": "top_passages",
"_source": false,
"fields": ["paragraphs.text"]
}
}
}
```

The `inner_hits` block returns the most relevant paragraphs within each top-level document. Use the `size` field to control how many matches you retrieve. If your query includes multiple kNN clauses, use the `name` field to avoid naming conflicts in the response.

```json
{
"took": 58,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": { "value": 2, "relation": "eq" }, <1>
"max_score": 1,
"hits": [
{
"_index": "multi_level_vectors",
"_id": "1",
"_score": 1, <2>
"fields": {
"title_vector": [0.5, 0.4]
},
"inner_hits": {
"top_passages": {
"hits": {
"total": { "value": 2, "relation": "eq" },
"max_score": 1,
"hits": [
{
"_nested": { "field": "paragraphs", "offset": 0 }, <3>
"_score": 1,
"fields": {
"paragraphs": [ { "text": ["First paragraph"] } ] <4>
}
},
{
"_nested": { "field": "paragraphs", "offset": 1 },
"_score": 0.92955077,
"fields": {
"paragraphs": [ { "text": ["Second paragraph"] } ]
}
}
]
}
}
}
},
{
"_index": "multi_level_vectors",
"_id": "2",
"_score": 0.8535534,
"fields": {
"title_vector": [0.1, 0.9]
},
"inner_hits": {
"top_passages": {
"hits": {
"total": { "value": 1, "relation": "eq" },
"max_score": 0.8535534,
"hits": [
{
"_nested": { "field": "paragraphs", "offset": 0 },
"_score": 0.8535534,
"fields": {
"paragraphs": [ { "text": ["Another one"] } ]
}
}
]
}
}
}
}
]
}
}
```

1. Two top-level documents matched the query.
2. The score of the document, based on the best matching paragraph.
3. Inner hits show which nested paragraph matched best.
4. The actual matching paragraph text.


### Limitations for approximate kNN search [approximate-knn-limitations]

* When using kNN search in [{{ccs}}](../../../solutions/search/cross-cluster-search.md), the [`ccs_minimize_roundtrips`](../../../solutions/search/cross-cluster-search.md#ccs-min-roundtrips) option is not supported.
Expand Down
Loading