Skip to content

Commit 2699fe0

Browse files
committed
Merge remote-tracking branch 'upstream/main' into ilm-explain-invalid-json
* upstream/main: Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691 Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690 [LTR] Fix feature display order when using explain. (elastic#137671) Remove extra RemoteClusterService instances in unit test (elastic#137647) Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669) Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233) Update bundled JDK to 25.0.1 (elastic#137640) resolve indices for prefixed _all expressions (elastic#137330) ESQL: Add TopN support for exponential histograms (elastic#137313) allows field caps to be cross project (elastic#137530) ESQL: Add exponential histogram percentile function (elastic#137553) Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636) Tighten on when THROTTLE decision can be returned (elastic#136794) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655 Add a test for two little known conditional processor paths (elastic#137645) Extract a common ORIGIN constant (elastic#137612) Remove early phase failure in batched (elastic#136889) Returning correct index mode from get data streams api (elastic#137646) [ML] Manage AD results indices (elastic#136065)
2 parents 7f4fcec + bcb1402 commit 2699fe0

File tree

86 files changed

+3490
-1280
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+3490
-1280
lines changed

build-tools-internal/version.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ elasticsearch = 9.3.0
22
lucene = 10.3.1
33

44
bundled_jdk_vendor = openjdk
5-
bundled_jdk = 25+36@bd75d5f9689641da8e1daabeccb5528b
5+
bundled_jdk = 25.0.1+8@2fbf10d8c78e40bd87641c434705079d
66
# optional dependencies
77
spatial4j = 0.7
88
jts = 1.15.0

docs/changelog/136065.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 136065
2+
summary: Nightly maintenance for anomaly detection results indices to keep to manageable size.
3+
area: Machine Learning
4+
type: enhancement
5+
issues: []

docs/changelog/136889.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 136889
2+
summary: Remove early phase failure in batched
3+
area: Search
4+
type: bug
5+
issues:
6+
- 134151

docs/changelog/137530.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137530
2+
summary: Allows field caps to be cross project
3+
area: Search
4+
type: enhancement
5+
issues: []

docs/changelog/137640.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137640
2+
summary: Update bundled JDK to Java 25.0.1+8
3+
area: Packaging
4+
type: upgrade
5+
issues: []

docs/changelog/137671.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137671
2+
summary: "[LTR] Fix feature display order when using explain"
3+
area: Search
4+
type: bug
5+
issues: []

docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@ $$$xpack.ml.max_open_jobs$$$
8686
`xpack.ml.nightly_maintenance_requests_per_second`
8787
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The rate at which the nightly maintenance task deletes expired model snapshots and results. The setting is a proxy to the [`requests_per_second`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-delete-by-query) parameter used in the delete by query requests and controls throttling. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than `0.0` or equal to `-1.0`, where `-1.0` means a default value is used. Defaults to `-1.0`
8888

89+
`xpack.ml.results_index_rollover_max_size`
90+
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum size the anomaly detection results indices can reach before being rolled over by the nightly maintenance task. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Valid values must be greater than or equal to `-1B`. A value of `-1B` means the indices will never be rolled over. A value of `0B` means the indices will always be rolled over, regardless of size. Defaults to `50GB`.
91+
8992
`xpack.ml.node_concurrent_job_allocations`
9093
: ([Dynamic](docs-content://deploy-manage/stack-settings.md#dynamic-cluster-setting)) The maximum number of jobs that can concurrently be in the `opening` state on each node. Typically, jobs spend a small amount of time in this state before they move to `open` state. Jobs that must restore large models when they are opening spend more time in the `opening` state. When the {{operator-feature}} is enabled, this setting can be updated only by operator users. Defaults to `2`.
9194

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 116 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ the field mappings.
4848

4949
:::::::{tab-set}
5050

51-
::::::{tab-item} Using the default ELSER on EIS endpoint on Serverless
51+
::::::{tab-item} Default ELSER on EIS endpoint on {{serverless-short}}
5252

5353
```{applies_to}
5454
serverless: ga
@@ -72,7 +72,7 @@ PUT my-index-000001
7272

7373
::::::
7474

75-
::::::{tab-item} Using the preconfigured ELSER on EIS endpoint in Cloud
75+
::::::{tab-item} Preconfigured ELSER on EIS endpoint in Cloud
7676

7777
```{applies_to}
7878
stack: ga 9.2
@@ -98,7 +98,7 @@ PUT my-index-000001
9898

9999
::::::
100100

101-
::::::{tab-item} Using the default ELSER endpoint
101+
::::::{tab-item} Default ELSER endpoint
102102

103103
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:
104104

@@ -544,9 +544,14 @@ stack: ga 9.2
544544
serverless: ga
545545
```
546546

547+
:::{important}
548+
Starting with {{es}} 9.2, the recommended method for retrieving embeddings has changed from that used in previous versions.
549+
For instructions on retrieving embeddings in versions earlier than 9.2, refer to [Returning semantic field embeddings using `fields`](#return-embeddings-fields).
550+
:::
551+
547552
By default, the embeddings generated for `semantic_text` fields are stored internally and **not included in `_source`** when retrieving documents.
548553

549-
To include the full inference fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
554+
To include the full {{infer}} fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
550555
This works with the
551556
[Get](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-get),
552557
[Search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search),
@@ -565,18 +570,23 @@ POST my-index/_search
565570
}
566571
}
567572
```
568-
% TEST[skip:Requires inference endpoint]
573+
% TEST[skip:Requires {{infer}} endpoint]
569574

570575
The embeddings will appear under `_inference_fields` in `_source`.
571576

572577
**Use cases**
578+
573579
Including embeddings in `_source` is useful when you want to:
574580

575581
* Reindex documents into another index **with the same `inference_id`** without re-running inference.
576582
* Export or migrate documents while preserving their embeddings.
577583
* Inspect or debug the raw embeddings generated for your content.
578584

579585
### Example: Reindex while preserving embeddings
586+
```{applies_to}
587+
stack: ga 9.2
588+
serverless: ga
589+
```
580590

581591
```console
582592
POST _reindex
@@ -592,7 +602,7 @@ POST _reindex
592602
}
593603
}
594604
```
595-
% TEST[skip:Requires inference endpoint]
605+
% TEST[skip:Requires {{infer}} endpoint]
596606

597607
1. Sends the source documents with their stored embeddings to the destination index.
598608

@@ -602,16 +612,110 @@ the documents will **fail the reindex task**.
602612
Matching `inference_id` values are required to reuse the existing embeddings.
603613
::::
604614

605-
This allows documents to be re-indexed without triggering inference again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.
615+
This allows documents to be re-indexed without triggering {{infer}} again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.
606616

607-
::::{note}
608-
**For versions prior to 9.2.0**
617+
### Example: Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]
618+
```{applies_to}
619+
stack: ga 9.2
620+
serverless: ga
621+
```
622+
623+
To verify that your embeddings look correct, you can retrieve the {{infer}} data that `semantic_text` normally hides from search results.
609624

610-
Older versions do not support the `exclude_vectors` option to retrieve the embeddings of the semantic text fields.
611-
To return the `_inference_fields`, use the `fields` option in a search request instead:
625+
To retrieve the stored embeddings in {{es}} 9.2 and later, set the `exclude_vectors` parameter to `false` in the `_source` field. This ensures that the vector data, which is excluded by default, is included in the search response.
612626

613627
```console
614628
POST test-index/_search
629+
{
630+
"_source": {
631+
"exclude_vectors": false
632+
},
633+
"query": {
634+
"match": {
635+
"my_semantic_field": "Which country is Paris in?"
636+
}
637+
}
638+
}
639+
```
640+
% TEST[skip:Requires {{infer}} endpoint]
641+
642+
This will return verbose chunked embeddings content that is used to perform
643+
semantic search for `semantic_text` fields:
644+
645+
```console-response
646+
{
647+
"took": 18,
648+
"timed_out": false,
649+
"_shards": {
650+
"total": 1,
651+
"successful": 1,
652+
"skipped": 0,
653+
"failed": 0
654+
},
655+
"hits": {
656+
"total": { "value": 1, "relation": "eq" },
657+
"max_score": 16.532316,
658+
"hits": [
659+
{
660+
"_index": "test-index",
661+
"_id": "1",
662+
"_score": 16.532316,
663+
"_source": {
664+
"my_semantic_field": "Paris is the capital of France.",
665+
"_inference_fields": {
666+
"my_semantic_field": {
667+
"inference": {
668+
"inference_id": ".elser-2-elasticsearch", <1>
669+
"model_settings": { <2>
670+
"service": "elasticsearch",
671+
"task_type": "sparse_embedding"
672+
},
673+
"chunks": {
674+
"my_semantic_field": [
675+
{
676+
"start_offset": 0,
677+
"end_offset": 31,
678+
"embeddings": { <3>
679+
"airport": 0.12011719,
680+
"brussels": 0.032836914,
681+
"capital": 2.1328125,
682+
"capitals": 0.6386719,
683+
"capitol": 1.2890625,
684+
"cities": 0.78125,
685+
"city": 1.265625,
686+
"continent": 0.26953125,
687+
"country": 0.59765625,
688+
...
689+
}
690+
}
691+
]
692+
}
693+
}
694+
}
695+
}
696+
}
697+
}
698+
]
699+
}
700+
}
701+
```
702+
% TEST[skip:Requires {{infer}} endpoint]
703+
1. The {{infer}} endpoint used to generate embeddings.
704+
2. Lists details about the model used to generate embeddings, such as the service name and task type.
705+
3. The embeddings generated for this chunk.
706+
707+
## Returning semantic field embeddings using `fields` [return-embeddings-fields]
708+
709+
:::{important}
710+
This method for returning semantic field embeddings is recommended only for {{es}} versions earlier than 9.2.
711+
For version 9.2 and later, use the [`exclude_vectors`](#troubleshooting-semantic-text-fields) parameter instead.
712+
:::
713+
714+
To retrieve stored embeddings, use the `fields` parameter with `_inference_fields`. This lets you include the vector data that is not shown by default in the response.
715+
The `fields` parameter only works with the `_search` endpoint.
716+
717+
```console
718+
POST my-index/_search
615719
{
616720
"query": {
617721
"match": {
@@ -623,11 +727,7 @@ POST test-index/_search
623727
]
624728
}
625729
```
626-
% TEST[skip:Requires inference endpoint]
627-
628-
This returns the chunked embeddings used for semantic search under `_inference_fields` in `_source`.
629-
Note that the `fields` option is **not** available for the Reindex API.
630-
::::
730+
% TEST[skip:Requires {{infer}} endpoint]
631731

632732
## Customizing `semantic_text` indexing [custom-indexing]
633733

@@ -741,30 +841,6 @@ You can query `semantic_text` fields using the following query types:
741841

742842
- [Semantic query](/reference/query-languages/query-dsl/query-dsl-semantic-query.md): We don't recommend this legacy query type for _new_ projects, because the alternatives in this list enable more flexibility and customization. The `semantic` query remains available to support existing implementations.
743843

744-
745-
## Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]
746-
747-
If you want to verify that your embeddings look correct, you can view the
748-
inference data that `semantic_text` typically hides using `fields`.
749-
750-
```console
751-
POST test-index/_search
752-
{
753-
"query": {
754-
"match": {
755-
"my_semantic_field": "Which country is Paris in?"
756-
}
757-
},
758-
"fields": [
759-
"_inference_fields"
760-
]
761-
}
762-
```
763-
% TEST[skip:Requires inference endpoint]
764-
765-
This will return verbose chunked embeddings content that is used to perform
766-
semantic search for `semantic_text` fields.
767-
768844
### Document count discrepancy in `_cat/indices`
769845

770846
When an index contains a `semantic_text` field, the `docs.count` value returned by the [`_cat/indices`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices) API may be higher than the number of documents you indexed.

modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/TransportGetDataStreamsAction.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,12 @@ static IndexMode resolveMode(
224224
indexMode = Enum.valueOf(IndexMode.class, rawMode.toUpperCase(Locale.ROOT));
225225
}
226226
}
227+
if (indexMode == null) {
228+
String rawMode = settings.get(IndexSettings.MODE.getKey());
229+
if (rawMode != null) {
230+
indexMode = Enum.valueOf(IndexMode.class, rawMode.toUpperCase(Locale.ROOT));
231+
}
232+
}
227233
return indexMode;
228234
}
229235

modules/ingest-common/src/yamlRestTest/resources/rest-api-spec/test/ingest/210_conditional_processor.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,3 +221,44 @@ teardown:
221221
index: test
222222
id: "2"
223223
catch: missing
224+
225+
---
226+
"Test conditionals support params and statements":
227+
- do:
228+
ingest.put_pipeline:
229+
id: "my_pipeline"
230+
body: >
231+
{
232+
"description": "_description",
233+
"processors": [
234+
{
235+
"set" : {
236+
"if" : {
237+
"source": "def hit = params.success_codes.containsKey(ctx.code); return hit != null && hit == true;",
238+
"params": {
239+
"success_codes" : {
240+
"10": true,
241+
"20": true
242+
}
243+
}
244+
},
245+
"field" : "result",
246+
"value" : "success"
247+
}
248+
}
249+
]
250+
}
251+
- match: { acknowledged: true }
252+
253+
- do:
254+
index:
255+
index: test
256+
id: "1"
257+
pipeline: "my_pipeline"
258+
body: { code: "20" }
259+
260+
- do:
261+
get:
262+
index: test
263+
id: "1"
264+
- match: { _source.result: "success" }

0 commit comments

Comments
 (0)