Skip to content

Commit a6efc73

Browse files
authored
Merge branch 'main' into esql-inference-runner-refactoring
2 parents ffa09c9 + c1d7e79 commit a6efc73

File tree

19 files changed

+730
-52
lines changed

19 files changed

+730
-52
lines changed

docs/changelog/113949.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pr: 113949
2+
summary: Support kNN filter on nested metadata
3+
area: Vector Search
4+
type: enhancement
5+
issues:
6+
- 128803
7+
- 106994

docs/reference/query-languages/query-dsl/query-dsl-knn-query.md

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -203,10 +203,19 @@ POST my-image-index/_search
203203
`knn` query can be used inside a nested query. The behaviour here is similar to [top level nested kNN search](docs-content://solutions/search/vector/knn.md#nested-knn-search):
204204

205205
* kNN search over nested dense_vectors diversifies the top results over the top-level document
206-
* `filter` over the top-level document metadata is supported and acts as a pre-filter
207-
* `filter` over `nested` field metadata is not supported
206+
* `filter` both over the top-level document metadata and `nested` is supported and acts as a pre-filter
207+
208+
::::{note}
209+
To ensure correct results: each individual filter must be either over
210+
the top-level metadata or `nested` metadata. However, a single knn query
211+
supports multiple filters, where some filters can be over the top-level
212+
metadata and some over nested.
213+
::::
208214

209-
A sample query can look like below:
215+
216+
Below is a sample query with filter over nested metadata.
217+
For scoring parents' documents, this query only considers vectors that
218+
have "paragraph.language" set to "EN".
210219

211220
```json
212221
{
@@ -215,12 +224,46 @@ A sample query can look like below:
215224
"path" : "paragraph",
216225
"query" : {
217226
"knn": {
218-
"query_vector": [
219-
0.45,
220-
45
221-
],
227+
"query_vector": [0.45, 0.50],
222228
"field": "paragraph.vector",
223-
"num_candidates": 2
229+
"filter": {
230+
"match": {
231+
"paragraph.language": "EN"
232+
}
233+
}
234+
}
235+
}
236+
}
237+
}
238+
}
239+
```
240+
241+
Below is a sample query with two filters: one over nested metadata
242+
and another over the top level metadata. For scoring parents' documents,
243+
this query only considers vectors whose parent's title contain "essay"
244+
word and have "paragraph.language" set to "EN".
245+
246+
```json
247+
{
248+
"query" : {
249+
"nested" : {
250+
"path" : "paragraph",
251+
"query" : {
252+
"knn": {
253+
"query_vector": [0.45, 0.50],
254+
"field": "paragraph.vector",
255+
"filter": [
256+
{
257+
"match": {
258+
"paragraph.language": "EN"
259+
}
260+
},
261+
{
262+
"match": {
263+
"title": "essay"
264+
}
265+
}
266+
]
224267
}
225268
}
226269
}

muted-tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -407,9 +407,6 @@ tests:
407407
- class: org.elasticsearch.xpack.esql.analysis.VerifierTests
408408
method: testMatchInsideEval
409409
issue: https://github.com/elastic/elasticsearch/issues/131336
410-
- class: org.elasticsearch.packaging.test.DockerTests
411-
method: test022InstallPluginsFromLocalArchive
412-
issue: https://github.com/elastic/elasticsearch/issues/116866
413410
- class: org.elasticsearch.packaging.test.DockerTests
414411
method: test071BindMountCustomPathWithDifferentUID
415412
issue: https://github.com/elastic/elasticsearch/issues/120917
@@ -623,6 +620,9 @@ tests:
623620
- class: org.elasticsearch.indices.cluster.FieldCapsForceConnectTimeoutIT
624621
method: testTimeoutSetting
625622
issue: https://github.com/elastic/elasticsearch/issues/132179
623+
- class: org.elasticsearch.index.mapper.vectors.DenseVectorFieldIndexTypeUpdateIT
624+
method: "testDenseVectorMappingUpdate {initialType=bbq_flat updateType=bbq_disk #2}"
625+
issue: https://github.com/elastic/elasticsearch/issues/132184
626626

627627
# Examples:
628628
#

qa/packaging/src/test/java/org/elasticsearch/packaging/test/DockerTests.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ public void test022InstallPluginsFromLocalArchive() {
205205

206206
listPluginArchive().forEach(System.out::println);
207207
assertThat("Expected " + plugin + " to not be installed", listPlugins(), not(hasItems(plugin)));
208-
assertThat("Expected " + plugin + " available in archive", listPluginArchive(), hasSize(16));
208+
assertThat("Expected " + plugin + " available in archive", listPluginArchive(), hasItems(containsString(plugin)));
209209

210210
// Stuff the proxy settings with garbage, so any attempt to go out to the internet would fail
211211
sh.getEnv()

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/100_knn_nested_search.yml

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ setup:
1616
nested:
1717
type: nested
1818
properties:
19+
language:
20+
type: keyword
1921
paragraph_id:
2022
type: keyword
2123
vector:
@@ -27,6 +29,13 @@ setup:
2729
type: hnsw
2830
m: 16
2931
ef_construction: 200
32+
nested2:
33+
type: nested
34+
properties:
35+
key:
36+
type: keyword
37+
value:
38+
type: keyword
3039

3140
- do:
3241
index:
@@ -37,8 +46,16 @@ setup:
3746
nested:
3847
- paragraph_id: 0
3948
vector: [230.0, 300.33, -34.8988, 15.555, -200.0]
49+
language: EN
4050
- paragraph_id: 1
4151
vector: [240.0, 300, -3, 1, -20]
52+
language: FR
53+
nested2:
54+
- key: "category"
55+
value: "domestic"
56+
- key: "level"
57+
value: "beginner"
58+
4259

4360
- do:
4461
index:
@@ -49,10 +66,18 @@ setup:
4966
nested:
5067
- paragraph_id: 0
5168
vector: [-0.5, 100.0, -13, 14.8, -156.0]
69+
language: EN
5270
- paragraph_id: 2
5371
vector: [0, 100.0, 0, 14.8, -156.0]
72+
language: EN
5473
- paragraph_id: 3
5574
vector: [0, 1.0, 0, 1.8, -15.0]
75+
language: FR
76+
nested2:
77+
- key: "category"
78+
value: "wild"
79+
- key: "level"
80+
value: "beginner"
5681

5782
- do:
5883
index:
@@ -63,6 +88,12 @@ setup:
6388
nested:
6489
- paragraph_id: 0
6590
vector: [0.5, 111.3, -13.0, 14.8, -156.0]
91+
language: FR
92+
nested2:
93+
- key: "category"
94+
value: "domestic"
95+
- key: "level"
96+
value: "advanced"
6697

6798
- do:
6899
indices.refresh: {}
@@ -461,3 +492,125 @@ setup:
461492
- match: {hits.hits.0._id: "2"}
462493
- length: {hits.hits.0.inner_hits.nested.hits.hits: 1}
463494
- match: {hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0"}
495+
496+
497+
---
498+
"Filter on nested fields":
499+
- requires:
500+
capabilities:
501+
- method: POST
502+
path: /_search
503+
capabilities: [ knn_filter_on_nested_fields ]
504+
test_runner_features: ["capabilities", "close_to"]
505+
reason: "Capability for filtering on nested fields required"
506+
507+
- do:
508+
search:
509+
index: test
510+
body:
511+
_source: false
512+
knn:
513+
boost: 2
514+
field: nested.vector
515+
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
516+
k: 3
517+
filter: { match: { nested.language: "EN" } }
518+
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }
519+
520+
- match: { hits.total.value: 2 }
521+
- match: { hits.hits.0._id: "2" }
522+
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 2 }
523+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
524+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "EN" }
525+
- match: { hits.hits.0.inner_hits.nested.hits.hits.1.fields.nested.0.paragraph_id.0: "2" }
526+
- match: { hits.hits.0.inner_hits.nested.hits.hits.1.fields.nested.0.language.0: "EN" }
527+
- close_to: { hits.hits.0._score: { value: 0.0182, error: 0.0001 } }
528+
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0182, error: 0.0001 } }
529+
- match: { hits.hits.1._id: "1" }
530+
- match: { hits.hits.1.inner_hits.nested.hits.total.value: 1 }
531+
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
532+
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "EN" }
533+
534+
535+
- do:
536+
search:
537+
index: test
538+
body:
539+
_source: false
540+
knn:
541+
boost: 2
542+
field: nested.vector
543+
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
544+
k: 3
545+
filter: { match: { nested.language: "FR" } }
546+
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }
547+
548+
- match: { hits.total.value: 3 }
549+
- match: { hits.hits.0._id: "3" }
550+
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 1 }
551+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
552+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
553+
- close_to: { hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
554+
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
555+
- match: { hits.hits.1._id: "2" }
556+
- match: { hits.hits.1.inner_hits.nested.hits.total.value: 1 }
557+
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "3" }
558+
- match: { hits.hits.1.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
559+
- match: { hits.hits.2._id: "1" }
560+
- match: { hits.hits.2.inner_hits.nested.hits.total.value: 1 }
561+
- match: { hits.hits.2.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "1" }
562+
- match: { hits.hits.2.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
563+
564+
# filter on both nested and parent metadata with 2 different filters
565+
- do:
566+
search:
567+
index: test
568+
body:
569+
_source: false
570+
knn:
571+
boost: 2
572+
field: nested.vector
573+
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
574+
k: 3
575+
num_candidates: 10
576+
filter: [{ match: { nested.language: "FR" }}, {term: {name: "rabbit.jpg"}} ]
577+
inner_hits: { size: 3, "fields": [ "nested.paragraph_id", "nested.language"], _source: false }
578+
579+
- match: { hits.total.value: 1 }
580+
- match: { hits.hits.0._id: "3" }
581+
- match: { hits.hits.0.inner_hits.nested.hits.total.value: 1 }
582+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.paragraph_id.0: "0" }
583+
- match: { hits.hits.0.inner_hits.nested.hits.hits.0.fields.nested.0.language.0: "FR" }
584+
- close_to: { hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
585+
- close_to: { hits.hits.0.inner_hits.nested.hits.hits.0._score: { value: 0.0043, error: 0.0001 } }
586+
587+
588+
---
589+
"Test filter on sibling nested fields works":
590+
- requires:
591+
capabilities:
592+
- method: POST
593+
path: /_search
594+
capabilities: [ knn_filter_on_nested_fields ]
595+
test_runner_features: ["capabilities", "close_to"]
596+
reason: "Capability for filtering on nested fields required"
597+
598+
- do:
599+
search:
600+
index: test
601+
body:
602+
_source: false
603+
knn:
604+
field: nested.vector
605+
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
606+
filter:
607+
nested:
608+
path: nested2
609+
query:
610+
bool:
611+
filter:
612+
- match:
613+
nested2.key: "category"
614+
- match:
615+
nested2.value: "domestic"
616+
- match: { hits.total.value: 2}

0 commit comments

Comments
 (0)