-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add inner hits support to semantic query #111834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 24 commits
854bc26
733cae8
ada0b3a
a8fac5f
557dc9a
8e94854
a860e69
dac2bd4
dd67452
9d5fa1d
639adad
4b8a62b
9311cc1
df127b9
202314c
17e8edb
91add83
ae898fd
23d3344
43b0a7f
91f21f9
a5a03d9
1982eda
b0244f1
a5ee5d8
e28e72f
ee95981
8c73841
b779e29
9f42742
a19fd6e
bb95eee
f62649d
3cbd7a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 111834 | ||
summary: Add inner hits support to semantic query | ||
area: Search | ||
type: enhancement | ||
issues: [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,9 +40,209 @@ The `semantic_text` field to perform the query on. | |
(Required, string) | ||
The query text to be searched for on the field. | ||
|
||
`chunks`:: | ||
(Optional, object) | ||
The passage ranking configuration. | ||
Mikep86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
See <<semantic-query-passage-ranking, Passage ranking with the `semantic` query>> for more information. | ||
Mikep86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
+ | ||
.Properties of `chunks` | ||
[%collapsible%open] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we usually collapse query DSL properties, based on looking at a few other pages. It's probably not a huge deal but inconsistent with pages like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the look of the collapsible blocks better personally, that's why I went with it :) @leemthompo Any guidance you can offer here? |
||
==== | ||
`from`:: | ||
(Optional, integer) | ||
The offset from the first chunk to fetch. | ||
Mikep86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Used to paginate through the chunks. | ||
Defaults to `0`. | ||
|
||
`size`:: | ||
(Optional, integer) | ||
The maximum number of chunks to return. | ||
Mikep86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Defaults to `3`. | ||
==== | ||
|
||
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and `semantic` query. | ||
|
||
[discrete] | ||
[[semantic-query-passage-ranking]] | ||
==== Passage ranking with the `semantic` query | ||
The `chunks` parameter can be used for _passage ranking_, which allows you to determine which chunk(s) in the document best match the query. | ||
For example, if you have a document that covers varying topics: | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
POST my-index/_doc/lake_tahoe | ||
{ | ||
"inference_field": [ | ||
"Lake Tahoe is the largest alpine lake in North America", | ||
"When hiking in the area, please be on alert for bears" | ||
] | ||
} | ||
------------------------------------------------------------ | ||
// TEST[skip:TBD] | ||
Mikep86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
You can use passage ranking to find the chunk that best matches your query: | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
GET my-index/_search | ||
{ | ||
"query": { | ||
"semantic": { | ||
"field": "inference_field", | ||
"query": "mountain lake", | ||
"chunks": { } | ||
} | ||
} | ||
} | ||
------------------------------------------------------------ | ||
// TEST[skip:TBD] | ||
|
||
[source,console-result] | ||
------------------------------------------------------------ | ||
{ | ||
"took": 67, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 1, | ||
"relation": "eq" | ||
}, | ||
"max_score": 10.844536, | ||
"hits": [ | ||
{ | ||
"_index": "my-index", | ||
"_id": "lake_tahoe", | ||
"_score": 10.844536, | ||
"_source": { | ||
... | ||
}, | ||
"inner_hits": { <1> | ||
"inference_field": { | ||
"hits": { | ||
"total": { | ||
"value": 2, | ||
"relation": "eq" | ||
}, | ||
"max_score": 10.844536, | ||
"hits": [ | ||
{ | ||
"_index": "my-index", | ||
"_id": "lake_tahoe", | ||
"_nested": { | ||
"field": "inference_field.inference.chunks", | ||
"offset": 0 | ||
}, | ||
"_score": 10.844536, | ||
"_source": { | ||
"text": "Lake Tahoe is the largest alpine lake in North America" | ||
} | ||
}, | ||
{ | ||
"_index": "my-index", | ||
"_id": "lake_tahoe", | ||
"_nested": { | ||
"field": "inference_field.inference.chunks", | ||
"offset": 1 | ||
}, | ||
"_score": 3.2726858, | ||
"_source": { | ||
"text": "When hiking in the area, please be on alert for bears" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
------------------------------------------------------------ | ||
<1> Ranked passages will be returned using the <<inner-hits,`inner_hits` response format>>, with `<inner_hits_name>` set to the `semantic_text` field name. | ||
|
||
By default, the top three matching chunks will be returned. | ||
You can use the `size` parameter to control the number of chunks returned and the `from` parameter to page through the matching chunks: | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
GET my-index/_search | ||
{ | ||
"query": { | ||
"semantic": { | ||
"field": "inference_field", | ||
"query": "mountain lake", | ||
"chunks": { | ||
"from": 1, | ||
"size": 1 | ||
} | ||
} | ||
} | ||
} | ||
------------------------------------------------------------ | ||
// TEST[skip:TBD] | ||
|
||
[source,console-result] | ||
------------------------------------------------------------ | ||
{ | ||
"took": 42, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 1, | ||
"relation": "eq" | ||
}, | ||
"max_score": 10.844536, | ||
"hits": [ | ||
{ | ||
"_index": "my-index", | ||
"_id": "lake_tahoe", | ||
"_score": 10.844536, | ||
"_source": { | ||
... | ||
}, | ||
"inner_hits": { | ||
"inference_field": { | ||
"hits": { | ||
"total": { | ||
"value": 2, | ||
"relation": "eq" | ||
}, | ||
"max_score": 10.844536, | ||
"hits": [ | ||
{ | ||
"_index": "my-index", | ||
"_id": "lake_tahoe", | ||
"_nested": { | ||
"field": "inference_field.inference.chunks", | ||
"offset": 1 | ||
}, | ||
"_score": 3.2726858, | ||
"_source": { | ||
"text": "When hiking in the area, please be on alert for bears" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
------------------------------------------------------------ | ||
|
||
[discrete] | ||
[[hybrid-search-semantic]] | ||
==== Hybrid search with the `semantic` query | ||
|
@@ -121,7 +321,7 @@ GET my-index/_search | |
|
||
[discrete] | ||
[[advanced-search]] | ||
=== Advanced search on `semantic_text` fields | ||
==== Advanced search on `semantic_text` fields | ||
|
||
The `semantic` query uses default settings for searching on `semantic_text` fields for ease of use. | ||
If you want to fine-tune a search on a `semantic_text` field, you need to know the task type used by the `inference_id` configured in `semantic_text`. | ||
|
@@ -135,7 +335,7 @@ on a `semantic_text` field, it is not supported to use the `semantic_query` on a | |
|
||
[discrete] | ||
[[search-sparse-inference]] | ||
==== Search with `sparse_embedding` inference | ||
===== Search with `sparse_embedding` inference | ||
|
||
When the {infer} endpoint uses a `sparse_embedding` model, you can use a <<query-dsl-sparse-vector-query,`sparse_vector` query>> on a <<semantic-text,`semantic_text`>> field in the following way: | ||
|
||
|
@@ -164,7 +364,7 @@ You can customize the `sparse_vector` query to include specific settings, like < | |
|
||
[discrete] | ||
[[search-text-inferece]] | ||
==== Search with `text_embedding` inference | ||
===== Search with `text_embedding` inference | ||
|
||
When the {infer} endpoint uses a `text_embedding` model, you can use a <<query-dsl-knn-query,`knn` query>> on a `semantic_text` field in the following way: | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.