Skip to content

Commit 8f21547

Browse files
committed
Merge remote-tracking branch 'elastic/main' into constant-blocks
2 parents 276759a + 0dbf9f7 commit 8f21547

File tree

76 files changed

+2365
-1556
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+2365
-1556
lines changed

docs/changelog/132387.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 132387
2+
summary: "[ExtraHop & QualysGAV] Add `manage`, `create_index`, `read`, `index`, `write`, `delete`, permission for third party agent indices `kibana_system`"
3+
area: Authorization
4+
type: enhancement
5+
issues:
6+
- 131825

docs/changelog/132410.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132410
2+
summary: Add support for retrieving semantic_text's indexed chunks via fields API
3+
area: Vector Search
4+
type: feature
5+
issues: []

docs/changelog/132497.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132497
2+
summary: Add cache miss and read metrics
3+
area: Searchable Snapshots
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 71 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,34 @@ PUT test-index/_doc/1
282282
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
283283
the input.
284284

285+
## Retrieving indexed chunks
286+
```{applies_to}
287+
stack: ga 9.2
288+
serverless: ga
289+
```
290+
291+
You can retrieve the individual chunks generated by your semantic field’s chunking
292+
strategy using the [fields parameter](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#search-fields-param):
293+
294+
```console
295+
POST test-index/_search
296+
{
297+
"query": {
298+
"ids" : {
299+
"values" : ["1"]
300+
}
301+
},
302+
"fields": [
303+
{
304+
"field": "semantic_text_field",
305+
"format": "chunks" <1>
306+
}
307+
]
308+
}
309+
```
310+
311+
1. Use `"format": "chunks"` to return the field’s text as the original text chunks that were indexed.
312+
285313
## Extracting relevant fragments from semantic text [semantic-text-highlighting]
286314

287315
You can extract the most relevant fragments from a semantic text field by using
@@ -311,27 +339,6 @@ POST test-index/_search
311339
2. Sorts the most relevant highlighted fragments by score when set to `score`. By default,
312340
fragments will be output in the order they appear in the field (order: none).
313341

314-
To use the `semantic` highlighter to view chunks in the order which they were indexed with no scoring,
315-
use the `match_all` query to retrieve them in the order they appear in the document:
316-
317-
```console
318-
POST test-index/_search
319-
{
320-
"query": {
321-
"match_all": {}
322-
},
323-
"highlight": {
324-
"fields": {
325-
"my_semantic_field": {
326-
"number_of_fragments": 5 <1>
327-
}
328-
}
329-
}
330-
}
331-
```
332-
333-
1. This will return the first 5 chunks, set this number higher to retrieve more chunks.
334-
335342
Highlighting is supported on fields other than semantic_text. However, if you
336343
want to restrict highlighting to the semantic highlighter and return no
337344
fragments when the field is not of type semantic_text, you can explicitly
@@ -359,6 +366,49 @@ PUT test-index
359366

360367
1. Ensures that highlighting is applied exclusively to semantic_text fields.
361368

369+
To retrieve all fragments from the `semantic` highlighter in their original indexing order
370+
without scoring, use a `match_all` query as the `highlight_query`.
371+
This ensures fragments are returned in the order they appear in the document:
372+
373+
```console
374+
POST test-index/_search
375+
{
376+
"query": {
377+
"ids": {
378+
"values": ["1"]
379+
}
380+
},
381+
"highlight": {
382+
"fields": {
383+
"my_semantic_field": {
384+
"number_of_fragments": 5, <1>
385+
"highlight_query": { "match_all": {} }
386+
}
387+
}
388+
}
389+
}
390+
```
391+
392+
1. Returns the first 5 fragments. Increase this value to retrieve additional fragments.
393+
394+
## Updates and partial updates for `semantic_text` fields [semantic-text-updates]
395+
396+
When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:
397+
398+
* **Full document updates**
399+
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.
400+
401+
* **Partial updates using the Bulk API**
402+
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.
403+
404+
* **Partial updates using the Update API**
405+
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.
406+
407+
If you want to avoid unnecessary inference and keep existing embeddings:
408+
409+
* Use **partial updates through the Bulk API**.
410+
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.
411+
362412
## Customizing `semantic_text` indexing [custom-indexing]
363413

364414
`semantic_text` uses defaults for indexing data based on the {{infer}} endpoint
@@ -404,24 +454,6 @@ PUT my-index-000004
404454
}
405455
```
406456

407-
### Customizing using ingest pipelines [custom-by-pipelines]
408-
```{applies_to}
409-
stack: ga 9.0
410-
```
411-
412-
In case you want to customize data indexing, use the
413-
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
414-
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
415-
field types and create an ingest pipeline with an
416-
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
417-
generate the embeddings.
418-
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
419-
walks you through the process. In these cases - when you use `sparse_vector` or
420-
`dense_vector` field types instead of the `semantic_text` field type to
421-
customize indexing - using the
422-
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
423-
is not supported for querying the field data.
424-
425457
## Updates to `semantic_text` fields [update-script]
426458

427459
For indices containing `semantic_text` fields, updates that use scripts have the
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.
2+
3+
```esql
4+
FROM books METADATA _score
5+
| WHERE MATCH(description, "hobbit") OR MATCH(author, "Tolkien")
6+
| SORT _score DESC
7+
| LIMIT 100
8+
| RERANK rerank_score = "hobbit" ON description, author WITH { "inference_id" : "test_reranker" }
9+
| EVAL original_score = _score, _score = rerank_score + original_score
10+
| SORT _score
11+
| LIMIT 3
12+
| KEEP title, original_score, rerank_score, _score
13+
```
14+
15+
| title:text | _score:double | rerank_score:double | rerank_score:double |
16+
| --- | --- | --- | --- |
17+
| Poems from the Hobbit | 4.012462615966797 | 0.001396648003719747 | 0.001396648003719747 |
18+
| The Lord of the Rings - Boxed Set | 3.768855094909668 | 0.0010020040208473802 | 0.001396648003719747 |
19+
| Return of the King Being the Third Part of The Lord of the Rings | 3.6248698234558105 | 9.000900317914784E-4 | 0.001396648003719747 |
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.
2+
3+
```esql
4+
FROM books METADATA _score
5+
| WHERE MATCH(description, "hobbit")
6+
| SORT _score DESC
7+
| LIMIT 100
8+
| RERANK "hobbit" ON description WITH { "inference_id" : "test_reranker" }
9+
| LIMIT 3
10+
| KEEP title, _score
11+
```
12+
13+
| title:text | _score:double |
14+
| --- | --- |
15+
| Poems from the Hobbit | 0.0015673980815336108 |
16+
| A Tolkien Compass: Including J. R. R. Tolkien's Guide to the Names in The Lord of the Rings | 0.007936508394777775 |
17+
| Return of the King Being the Third Part of The Lord of the Rings | 9.960159659385681E-4 |
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.
2+
3+
```esql
4+
FROM books METADATA _score
5+
| WHERE MATCH(description, "hobbit") OR MATCH(author, "Tolkien")
6+
| SORT _score DESC
7+
| LIMIT 100
8+
| RERANK rerank_score = "hobbit" ON description, author WITH { "inference_id" : "test_reranker" }
9+
| SORT rerank_score
10+
| LIMIT 3
11+
| KEEP title, _score, rerank_score
12+
```
13+
14+
| title:text | _score:double | rerank_score:double |
15+
| --- | --- | --- |
16+
| Return of the Shadow | 2.8181066513061523 | 5.740527994930744E-4 |
17+
| Return of the King Being the Third Part of The Lord of the Rings | 3.6248698234558105 | 9.000900317914784E-4 |
18+
| The Lays of Beleriand | 1.3002015352249146 | 9.36329597607255E-4 |

docs/reference/query-languages/esql/_snippets/commands/layout/rerank.md

Lines changed: 6 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -100,61 +100,17 @@ If you don't want to increase the timeout limit, try the following:
100100

101101
Rerank search results using a simple query and a single field:
102102

103-
```esql
104-
FROM books
105-
| WHERE MATCH(title, "science fiction")
106-
| SORT _score DESC
107-
| LIMIT 100
108-
| RERANK "science fiction" ON (title) WITH { "inference_id" : "my_reranker" }
109-
| LIMIT 3
110-
| KEEP title, _score
111-
```
112103

113-
| title:keyword | _score:double |
114-
|---------------|---------------|
115-
| Neuromancer | 0.98 |
116-
| Dune | 0.95 |
117-
| Foundation | 0.92 |
104+
:::{include} ../examples/rerank.csv-spec/simple-query.md
105+
:::
118106

119107
Rerank search results using a query and multiple fields, and store the new score
120108
in a column named `rerank_score`:
121109

122-
```esql
123-
FROM movies
124-
| WHERE MATCH(title, "dystopian future") OR MATCH(synopsis, "dystopian future")
125-
| SORT _score DESC
126-
| LIMIT 100
127-
| RERANK rerank_score = "dystopian future" ON (title, synopsis) WITH { "inference_id" : "my_reranker" }
128-
| SORT rerank_score DESC
129-
| LIMIT 5
130-
| KEEP title, _score, rerank_score
131-
```
132-
133-
| title:keyword | _score:double | rerank_score:double |
134-
|-----------------|---------------|---------------------|
135-
| Blade Runner | 8.75 | 0.99 |
136-
| The Matrix | 9.12 | 0.97 |
137-
| Children of Men | 8.50 | 0.96 |
138-
| Akira | 8.99 | 0.94 |
139-
| Gattaca | 8.65 | 0.91 |
110+
:::{include} ../examples/rerank.csv-spec/two-queries.md
111+
:::
140112

141113
Combine the original score with the reranked score:
142114

143-
```esql
144-
FROM movies
145-
| WHERE MATCH(title, "dystopian future") OR MATCH(synopsis, "dystopian future")
146-
| SORT _score DESC
147-
| LIMIT 100
148-
| RERANK rerank_score = "dystopian future" ON (title, synopsis) WITH { "inference_id" : "my_reranker" }
149-
| EVAL original_score = _score, _score = rerank_score + original_score
150-
| SORT _score DESC
151-
| LIMIT 2
152-
| KEEP title, original_score, rerank_score, _score
153-
```
154-
155-
| title:keyword | original_score:double | rerank_score:double | _score:double |
156-
|---------------|-----------------------|---------------------|---------------|
157-
| The Matrix | 9.12 | 0.97 | 10.09 |
158-
| Akira | 8.99 | 0.94 | 9.93 |
159-
160-
115+
:::{include} ../examples/rerank.csv-spec/combine.md
116+
:::

docs/reference/query-languages/esql/_snippets/functions/layout/copy_sign.md

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/release-notes/breaking-changes.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ If you are migrating from a version prior to version 9.0, you must first upgrade
1212

1313
% ## Next version [elasticsearch-nextversion-breaking-changes]
1414

15+
```{applies_to}
16+
stack: coming 9.1.1
17+
```
18+
## 9.1.1 [elasticsearch-9.1.1-breaking-changes]
19+
20+
There are no breaking changes associated with this release.
21+
1522
## 9.1.0 [elasticsearch-9.1.0-breaking-changes]
1623

1724
Discovery-Plugins:

0 commit comments

Comments
 (0)