Skip to content

Commit 056e019

Browse files
Merge branch 'main' into es-12587-cps-enable-remote-cluster-server-port
2 parents c0e4634 + a7b434b commit 056e019

File tree

90 files changed

+2587
-707
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+2587
-707
lines changed

docs/changelog/131341.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 131341
2+
summary: Consider min/max from predicates when transform date_trunc/bucket to `round_to`
3+
area: ES|QL
4+
type: enhancement
5+
issues: []

docs/changelog/131536.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
pr: 131536
2-
summary: "Component Templates: Add `{created,modified}_date`"
2+
summary: "Component Templates: Add created and modified date"
33
area: Ingest Node
44
type: enhancement
55
issues: []

docs/changelog/132083.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132083
2+
summary: "Index template: Add created_date and modified_date"
3+
area: Ingest Node
4+
type: enhancement
5+
issues: []

docs/changelog/132143.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 132143
2+
summary: Consider min/max from predicates when transform date_trunc/bucket to `round_to`
3+
option 2
4+
area: ES|QL
5+
type: enhancement
6+
issues: []

docs/changelog/132387.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 132387
2+
summary: "[ExtraHop & QualysGAV] Add `manage`, `create_index`, `read`, `index`, `write`, `delete`, permission for third party agent indices `kibana_system`"
3+
area: Authorization
4+
type: enhancement
5+
issues:
6+
- 131825

docs/changelog/132410.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132410
2+
summary: Add support for retrieving semantic_text's indexed chunks via fields API
3+
area: Vector Search
4+
type: feature
5+
issues: []

docs/changelog/132459.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132459
2+
summary: Small fixes for COPY_SIGN
3+
area: ES|QL
4+
type: bug
5+
issues: []

docs/changelog/132497.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132497
2+
summary: Add cache miss and read metrics
3+
area: Searchable Snapshots
4+
type: enhancement
5+
issues: []

docs/changelog/132511.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 132511
2+
summary: Handle special regex cases for version fields
3+
area: Search
4+
type: bug
5+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 71 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,34 @@ PUT test-index/_doc/1
282282
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
283283
the input.
284284

285+
## Retrieving indexed chunks
286+
```{applies_to}
287+
stack: ga 9.2
288+
serverless: ga
289+
```
290+
291+
You can retrieve the individual chunks generated by your semantic field’s chunking
292+
strategy using the [fields parameter](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#search-fields-param):
293+
294+
```console
295+
POST test-index/_search
296+
{
297+
"query": {
298+
"ids" : {
299+
"values" : ["1"]
300+
}
301+
},
302+
"fields": [
303+
{
304+
"field": "semantic_text_field",
305+
"format": "chunks" <1>
306+
}
307+
]
308+
}
309+
```
310+
311+
1. Use `"format": "chunks"` to return the field’s text as the original text chunks that were indexed.
312+
285313
## Extracting relevant fragments from semantic text [semantic-text-highlighting]
286314

287315
You can extract the most relevant fragments from a semantic text field by using
@@ -311,27 +339,6 @@ POST test-index/_search
311339
2. Sorts the most relevant highlighted fragments by score when set to `score`. By default,
312340
fragments will be output in the order they appear in the field (order: none).
313341

314-
To use the `semantic` highlighter to view chunks in the order which they were indexed with no scoring,
315-
use the `match_all` query to retrieve them in the order they appear in the document:
316-
317-
```console
318-
POST test-index/_search
319-
{
320-
"query": {
321-
"match_all": {}
322-
},
323-
"highlight": {
324-
"fields": {
325-
"my_semantic_field": {
326-
"number_of_fragments": 5 <1>
327-
}
328-
}
329-
}
330-
}
331-
```
332-
333-
1. This will return the first 5 chunks, set this number higher to retrieve more chunks.
334-
335342
Highlighting is supported on fields other than semantic_text. However, if you
336343
want to restrict highlighting to the semantic highlighter and return no
337344
fragments when the field is not of type semantic_text, you can explicitly
@@ -359,6 +366,49 @@ PUT test-index
359366

360367
1. Ensures that highlighting is applied exclusively to semantic_text fields.
361368

369+
To retrieve all fragments from the `semantic` highlighter in their original indexing order
370+
without scoring, use a `match_all` query as the `highlight_query`.
371+
This ensures fragments are returned in the order they appear in the document:
372+
373+
```console
374+
POST test-index/_search
375+
{
376+
"query": {
377+
"ids": {
378+
"values": ["1"]
379+
}
380+
},
381+
"highlight": {
382+
"fields": {
383+
"my_semantic_field": {
384+
"number_of_fragments": 5, <1>
385+
"highlight_query": { "match_all": {} }
386+
}
387+
}
388+
}
389+
}
390+
```
391+
392+
1. Returns the first 5 fragments. Increase this value to retrieve additional fragments.
393+
394+
## Updates and partial updates for `semantic_text` fields [semantic-text-updates]
395+
396+
When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:
397+
398+
* **Full document updates**
399+
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.
400+
401+
* **Partial updates using the Bulk API**
402+
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.
403+
404+
* **Partial updates using the Update API**
405+
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.
406+
407+
If you want to avoid unnecessary inference and keep existing embeddings:
408+
409+
* Use **partial updates through the Bulk API**.
410+
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.
411+
362412
## Customizing `semantic_text` indexing [custom-indexing]
363413

364414
`semantic_text` uses defaults for indexing data based on the {{infer}} endpoint
@@ -404,24 +454,6 @@ PUT my-index-000004
404454
}
405455
```
406456

407-
### Customizing using ingest pipelines [custom-by-pipelines]
408-
```{applies_to}
409-
stack: ga 9.0
410-
```
411-
412-
In case you want to customize data indexing, use the
413-
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
414-
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
415-
field types and create an ingest pipeline with an
416-
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
417-
generate the embeddings.
418-
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
419-
walks you through the process. In these cases - when you use `sparse_vector` or
420-
`dense_vector` field types instead of the `semantic_text` field type to
421-
customize indexing - using the
422-
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
423-
is not supported for querying the field data.
424-
425457
## Updates to `semantic_text` fields [update-script]
426458

427459
For indices containing `semantic_text` fields, updates that use scripts have the

0 commit comments

Comments
 (0)