Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d29e647
Support kNN filter on nested metadata
mayya-sharipova Oct 1, 2024
6e6abae
Update docs/changelog/113949.yaml
mayya-sharipova Oct 2, 2024
426db4d
Spotless
mayya-sharipova Oct 2, 2024
d9eb1e9
Address test failure
mayya-sharipova Oct 2, 2024
891b091
Fix test
mayya-sharipova Oct 2, 2024
923b6c0
Fix test failure
mayya-sharipova Oct 2, 2024
d9a13a9
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Oct 2, 2024
9aa706f
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Oct 25, 2024
9c56fff
Allow filters on both parent and nested metadata in the same knn query
mayya-sharipova Oct 25, 2024
afe3d8f
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Jul 9, 2025
7bc25d3
Add documentation
mayya-sharipova Jul 9, 2025
e3b0251
Add pre-check for a query to be both on nested and parent metada field
mayya-sharipova Jul 9, 2025
a2f365e
Revert "Add pre-check for a query to be both on nested and parent met…
mayya-sharipova Jul 11, 2025
f423619
Add small changes
mayya-sharipova Jul 11, 2025
acbfccd
Add test for nested sibling docs
mayya-sharipova Jul 12, 2025
7cc0b09
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Jul 12, 2025
1aa3605
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Jul 29, 2025
5a818bc
Some adjustments
mayya-sharipova Jul 29, 2025
6c3230e
Merge remote-tracking branch 'upstream/main' into knn_query_nested_fi…
mayya-sharipova Jul 29, 2025
c0bb4ff
Update docs/changelog/113949.yaml
mayya-sharipova Jul 29, 2025
24e4943
Merge branch 'main' into knn_query_nested_filter
mayya-sharipova Jul 29, 2025
8f9cf54
Merge branch 'main' into knn_query_nested_filter
mayya-sharipova Jul 29, 2025
867b928
Merge branch 'main' into knn_query_nested_filter
mayya-sharipova Jul 29, 2025
cdf4e92
Merge branch 'main' into knn_query_nested_filter
mayya-sharipova Jul 29, 2025
d6d5f8e
Merge branch 'main' into knn_query_nested_filter
mayya-sharipova Jul 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/changelog/113949.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pr: 113949
summary: Support kNN filter on nested metadata
area: Vector Search
type: enhancement
issues:
- 128803
- 106994
59 changes: 51 additions & 8 deletions docs/reference/query-languages/query-dsl/query-dsl-knn-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,19 @@ POST my-image-index/_search
`knn` query can be used inside a nested query. The behaviour here is similar to [top level nested kNN search](docs-content://solutions/search/vector/knn.md#nested-knn-search):

* kNN search over nested dense_vectors diversifies the top results over the top-level document
* `filter` over the top-level document metadata is supported and acts as a pre-filter
* `filter` over `nested` field metadata is not supported
* `filter` both over the top-level document metadata and `nested` is supported and acts as a pre-filter

::::{note}
To ensure correct results: each individual filter must be either over
the top-level metadata or `nested` metadata. However, a single knn query
supports multiple filters, where some filters can be over the top-level
metadata and some over nested.
::::

A sample query can look like below:

Below is a sample query with filter over nested metadata.
For scoring parents' documents, this query only considers vectors that
have "paragraph.language" set to "EN".

```json
{
Expand All @@ -215,12 +224,46 @@ A sample query can look like below:
"path" : "paragraph",
"query" : {
"knn": {
"query_vector": [
0.45,
45
],
"query_vector": [0.45, 0.50],
"field": "paragraph.vector",
"num_candidates": 2
"filter": {
"match": {
"paragraph.language": "EN"
}
}
}
}
}
}
}
```

Below is a sample query with two filters: one over nested metadata
and another over the top level metadata. For scoring parents' documents,
this query only considers vectors whose parent's title contain "essay"
word and have "paragraph.language" set to "EN".

```json
{
"query" : {
"nested" : {
"path" : "paragraph",
"query" : {
"knn": {
"query_vector": [0.45, 0.50],
"field": "paragraph.vector",
"filter": [
{
"match": {
"paragraph.language": "EN"
}
},
{
"match": {
"title": "essay"
}
}
]
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ setup:
nested:
type: nested
properties:
language:
type: keyword
paragraph_id:
type: keyword
vector:
Expand All @@ -27,6 +29,13 @@ setup:
type: hnsw
m: 16
ef_construction: 200
nested2:
type: nested
properties:
key:
type: keyword
value:
type: keyword

- do:
index:
Expand All @@ -37,8 +46,16 @@ setup:
nested:
- paragraph_id: 0
vector: [230.0, 300.33, -34.8988, 15.555, -200.0]
language: EN
- paragraph_id: 1
vector: [240.0, 300, -3, 1, -20]
language: FR
nested2:
- key: "category"
value: "domestic"
- key: "level"
value: "beginner"


- do:
index:
Expand All @@ -49,10 +66,18 @@ setup:
nested:
- paragraph_id: 0
vector: [-0.5, 100.0, -13, 14.8, -156.0]
language: EN
- paragraph_id: 2
vector: [0, 100.0, 0, 14.8, -156.0]
language: EN
- paragraph_id: 3
vector: [0, 1.0, 0, 1.8, -15.0]
language: FR
nested2:
- key: "category"
value: "wild"
- key: "level"
value: "beginner"

- do:
index:
Expand All @@ -63,6 +88,12 @@ setup:
nested:
- paragraph_id: 0
vector: [0.5, 111.3, -13.0, 14.8, -156.0]
language: FR
nested2:
- key: "category"
value: "domestic"
- key: "level"
value: "advanced"

- do:
indices.refresh: {}
Expand Down Expand Up @@ -461,3 +492,125 @@ setup:
- match: {hits.hits.0._id: "2"}
- length: {hits.hits.0.inner_hits.nested.hits.hits: 1}
- match: {hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0"}


---
"Filter on nested fields":
- requires:
capabilities:
- method: POST
path: /_search
capabilities: [ knn_filter_on_nested_fields ]
test_runner_features: ["capabilities", "close_to"]
reason: "Capability for filtering on nested fields required"

- do:
search:
index: test
body:
_source: false
knn:
boost: 2
field: nested.vector
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
k: 3
filter: { match: { nested.language: "EN" } }
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }

- match: { hits.total.value: 2 }
- match: { hits.hits.0._id: "2" }
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 2 }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "EN" }
- match: { hits.hits.0.inner_hits.nested.hits.hits.1.fields.nested.0.paragraph_id.0: "2" }
- match: { hits.hits.0.inner_hits.nested.hits.hits.1.fields.nested.0.language.0: "EN" }
- close_to: { hits.hits.0._score: { value: 0.0182, error: 0.0001 } }
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0182, error: 0.0001 } }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.1.inner_hits.nested.hits.total.value: 1 }
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "EN" }


- do:
search:
index: test
body:
_source: false
knn:
boost: 2
field: nested.vector
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
k: 3
filter: { match: { nested.language: "FR" } }
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }

- match: { hits.total.value: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 1 }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
- close_to: { hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
- match: { hits.hits.1._id: "2" }
- match: { hits.hits.1.inner_hits.nested.hits.total.value: 1 }
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "3" }
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
- match: { hits.hits.2._id: "1" }
- match: { hits.hits.2.inner_hits.nested.hits.total.value: 1 }
- match: { hits.hits.2.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "1" }
- match: { hits.hits.2.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }

# filter on both nested and parent metadata with 2 different filters
- do:
search:
index: test
body:
_source: false
knn:
boost: 2
field: nested.vector
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
k: 3
num_candidates: 10
filter: [{ match: { nested.language: "FR" }}, {term: {name: "rabbit.jpg"}} ]
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }

- match: { hits.total.value: 1 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 1 }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
- close_to: { hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0043, error: 0.0001 } }


---
"Test filter on sibling nested fields works":
- requires:
capabilities:
- method: POST
path: /_search
capabilities: [ knn_filter_on_nested_fields ]
test_runner_features: ["capabilities", "close_to"]
reason: "Capability for filtering on nested fields required"

- do:
search:
index: test
body:
_source: false
knn:
field: nested.vector
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
filter:
nested:
path: nested2
query:
bool:
filter:
- match:
nested2.key: "category"
- match:
nested2.value: "domestic"
- match: { hits.total.value: 2}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here with the current implementation we can query nested sibling fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sort of hoping that we could do something similar in the nested query since we look to the parent level. But it gets too complicated given the nested context?

Loading
Loading