Skip to content

Commit dda5531

Browse files
authored
Merge branch 'main' into 2025/10/24/new-id-format
2 parents 136a267 + ec0efaf commit dda5531

File tree

64 files changed

+2962
-1936
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2962
-1936
lines changed

docs/changelog/137023.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137023
2+
summary: Support choosing the downsampling method in data stream lifecycle
3+
area: "Data streams"
4+
type: enhancement
5+
issues: []

docs/changelog/137331.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137331
2+
summary: Add ES93BloomFilterStoredFieldsFormat for efficient field existence checks
3+
area: TSDB
4+
type: enhancement
5+
issues: []

docs/changelog/137442.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 137442
2+
summary: Handle ._original stored fields with fls
3+
area: "Authorization"
4+
type: bug
5+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 102 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,12 @@ type make it simpler to perform semantic search on your data. The
2525
with [match](/reference/query-languages/query-dsl/query-dsl-match-query.md), [sparse_vector](/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md)
2626
or [knn](/reference/query-languages/query-dsl/query-dsl-knn-query.md) queries.
2727

28-
If you don’t specify an inference endpoint, the `inference_id` field defaults to
29-
`.elser-2-elasticsearch`, a preconfigured endpoint for the elasticsearch
30-
service.
28+
{applies_to}`serverless: ga` If you don’t specify an {{infer}} endpoint, the `inference_id` field defaults to
29+
`.elser-v2-elastic`, a preconfigured endpoint for the `elasticsearch` service.
30+
This endpoint uses the [Elastic {{infer-cap}} Service (EIS)](docs-content://explore-analyze/elastic-inference/eis.md#elser-on-eis).
31+
32+
{applies_to}`stack: ga 9.0` If you don’t specify an {{infer}} endpoint, the `inference_id` field defaults to
33+
`.elser-2-elasticsearch`, a preconfigured endpoint for the `elasticsearch` service.
3134

3235
Using `semantic_text`, you won’t need to specify how to generate embeddings for
3336
your data, or how to index it. The {{infer}} endpoint automatically determines
@@ -43,10 +46,15 @@ You can use either preconfigured endpoints in your `semantic_text` fields which
4346
are ideal for most use cases or create custom endpoints and reference them in
4447
the field mappings.
4548

46-
### Using the default ELSER endpoint
49+
:::::::{tab-set}
50+
51+
::::::{tab-item} Using the default ELSER on EIS endpoint on Serverless
52+
53+
```{applies_to}
54+
serverless: ga
55+
```
4756

48-
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up
49-
`semantic_text` with the following API request:
57+
If you use the default `.elser-v2-elastic` endpoint that runs on EIS, you can set up `semantic_text` with the following API request:
5058

5159
```console
5260
PUT my-index-000001
@@ -62,10 +70,58 @@ PUT my-index-000001
6270
```
6371
% TEST[skip:Requires inference endpoint]
6472

73+
::::::
74+
75+
::::::{tab-item} Using the preconfigured ELSER on EIS endpoint in Cloud
76+
77+
```{applies_to}
78+
stack: ga 9.2
79+
deployment:
80+
self: unavailable
81+
```
82+
83+
If you use the preconfigured `.elser-v2-elastic` endpoint that runs on EIS, you can set up `semantic_text` with the following API request:
84+
85+
```console
86+
PUT my-index-000001
87+
{
88+
"mappings": {
89+
"properties": {
90+
"inference_field": {
91+
"type": "semantic_text",
92+
"inference_id": ".elser-v2-elastic"
93+
}
94+
}
95+
}
96+
}
97+
```
98+
99+
::::::
100+
101+
::::::{tab-item} Using the default ELSER endpoint
102+
103+
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:
104+
105+
```console
106+
PUT my-index-000001
107+
{
108+
"mappings": {
109+
"properties": {
110+
"inference_field": {
111+
"type": "semantic_text"
112+
}
113+
}
114+
}
115+
}
116+
```
117+
118+
::::::
119+
120+
:::::::
121+
65122
### Using a custom endpoint
66123

67-
To use a custom {{infer}} endpoint instead of the default
68-
`.elser-2-elasticsearch`, you
124+
To use a custom {{infer}} endpoint instead of the default, you
69125
must [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put)
70126
and specify its `inference_id` when setting up the `semantic_text` field type.
71127

@@ -110,14 +166,48 @@ PUT my-index-000003
110166
% TEST[skip:Requires inference endpoint]
111167

112168
### Using ELSER on EIS
169+
113170
```{applies_to}
114-
stack: preview 9.1
115-
serverless: preview
171+
stack: preview 9.1 ga 9.2
172+
deployment:
173+
self: unavailable
174+
serverless: ga
116175
```
117176

118177
If you use the preconfigured `.elser-2-elastic` endpoint that utilizes the ELSER model as a service through the Elastic Inference Service ([ELSER on EIS](docs-content://explore-analyze/elastic-inference/eis.md#elser-on-eis)), you can
119178
set up `semantic_text` with the following API request:
120179

180+
:::::::{tab-set}
181+
182+
::::::{tab-item} Using ELSER on EIS on Serverless
183+
184+
```{applies_to}
185+
serverless: ga
186+
```
187+
188+
```console
189+
PUT my-index-000001
190+
{
191+
"mappings": {
192+
"properties": {
193+
"inference_field": {
194+
"type": "semantic_text"
195+
}
196+
}
197+
}
198+
}
199+
```
200+
201+
::::::
202+
203+
::::::{tab-item} Using ELSER on EIS in Cloud
204+
205+
```{applies_to}
206+
stack: ga 9.2
207+
deployment:
208+
self: unavailable
209+
```
210+
121211
```console
122212
PUT my-index-000001
123213
{
@@ -133,10 +223,9 @@ PUT my-index-000001
133223
```
134224
% TEST[skip:Requires inference endpoint]
135225

136-
::::{note}
137-
While we do encourage experimentation, we do not recommend implementing production use cases on top of this feature while it is in Technical Preview.
226+
::::::
138227

139-
::::
228+
:::::::
140229

141230
## Parameters for `semantic_text` fields [semantic-text-params]
142231

docs/reference/query-languages/esql/_snippets/commands/layout/fork.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ FORK ( <processing_commands> ) ( <processing_commands> ) ... ( <processing_comma
1717
The `FORK` processing command creates multiple execution branches to operate
1818
on the same input data and combines the results in a single output table. A discriminator column (`_fork`) is added to identify which branch each row came from.
1919

20+
Together with the [`FUSE`](/reference/query-languages/esql/commands/fuse.md) command, `FORK` enables hybrid search to combine and score results from multiple queries. To learn more about using {{esql}} for search, refer to [ES|QL for search](docs-content://solutions/search/esql-for-search.md).
21+
2022
**Branch identification:**
2123
- The `_fork` column identifies each branch with values like `fork1`, `fork2`, `fork3`
2224
- Values correspond to the order branches are defined

modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,10 @@ Set<Index> maybeExecuteDownsampling(ProjectState projectState, DataStream dataSt
524524
// - has matching downsample rounds
525525
// - is read-only
526526
// So let's wait for an in-progress downsampling operation to succeed or trigger the last matching round
527-
affectedIndices.addAll(waitForInProgressOrTriggerDownsampling(dataStream, backingIndexMeta, downsamplingRounds, project));
527+
var downsamplingMethod = dataStream.getDataLifecycle().downsamplingMethod();
528+
affectedIndices.addAll(
529+
waitForInProgressOrTriggerDownsampling(dataStream, backingIndexMeta, downsamplingRounds, downsamplingMethod, project)
530+
);
528531
}
529532
}
530533

@@ -541,6 +544,7 @@ private Set<Index> waitForInProgressOrTriggerDownsampling(
541544
DataStream dataStream,
542545
IndexMetadata backingIndex,
543546
List<DataStreamLifecycle.DownsamplingRound> downsamplingRounds,
547+
DownsampleConfig.SamplingMethod downsamplingMethod,
544548
ProjectMetadata project
545549
) {
546550
assert dataStream.getIndices().contains(backingIndex.getIndex())
@@ -556,7 +560,7 @@ private Set<Index> waitForInProgressOrTriggerDownsampling(
556560
String downsampleIndexName = DownsampleConfig.generateDownsampleIndexName(
557561
DOWNSAMPLED_INDEX_PREFIX,
558562
backingIndex,
559-
round.config().getFixedInterval()
563+
round.fixedInterval()
560564
);
561565
IndexMetadata targetDownsampleIndexMeta = project.index(downsampleIndexName);
562566
boolean targetDownsampleIndexExists = targetDownsampleIndexMeta != null;
@@ -568,7 +572,8 @@ private Set<Index> waitForInProgressOrTriggerDownsampling(
568572
INDEX_DOWNSAMPLE_STATUS.get(targetDownsampleIndexMeta.getSettings()),
569573
round,
570574
lastRound,
571-
index,
575+
downsamplingMethod,
576+
backingIndex,
572577
targetDownsampleIndexMeta.getIndex()
573578
);
574579
if (downsamplingNotComplete.isEmpty() == false) {
@@ -580,7 +585,7 @@ private Set<Index> waitForInProgressOrTriggerDownsampling(
580585
// no maintenance needed for previously started downsampling actions and we are on the last matching round so it's time
581586
// to kick off downsampling
582587
affectedIndices.add(index);
583-
downsampleIndexOnce(round, project.id(), indexName, downsampleIndexName);
588+
downsampleIndexOnce(round, downsamplingMethod, project.id(), backingIndex, downsampleIndexName);
584589
}
585590
}
586591
}
@@ -592,16 +597,30 @@ private Set<Index> waitForInProgressOrTriggerDownsampling(
592597
*/
593598
private void downsampleIndexOnce(
594599
DataStreamLifecycle.DownsamplingRound round,
600+
DownsampleConfig.SamplingMethod requestedDownsamplingMethod,
595601
ProjectId projectId,
596-
String sourceIndex,
602+
IndexMetadata sourceIndexMetadata,
597603
String downsampleIndexName
598604
) {
605+
// When an index is already downsampled with a method, we require all later downsampling rounds to use the same method.
606+
// This is necessary to preserve the relation of the downsampled index to the raw data. For example, if an index is already
607+
// downsampled and downsampled it again to 1 hour; we know that a document represents either the aggregated raw data of an hour
608+
// or the last value of the raw data within this hour. If we mix the methods, we cannot derive any meaning from them.
609+
// Furthermore, data stream lifecycle is configured on the data stream level and not on the individual index level, meaning that
610+
// when a user changes downsampling method, some indices would not be able to be downsampled anymore.
611+
// For this reason, when we encounter an already downsampled index, we use the source downsampling method which might be different
612+
// from the requested one.
613+
var sourceIndexSamplingMethod = DownsampleConfig.SamplingMethod.fromIndexMetadata(sourceIndexMetadata);
614+
String sourceIndex = sourceIndexMetadata.getIndex().getName();
599615
DownsampleAction.Request request = new DownsampleAction.Request(
600616
TimeValue.THIRTY_SECONDS /* TODO should this be longer/configurable? */,
601617
sourceIndex,
602618
downsampleIndexName,
603619
null,
604-
round.config()
620+
new DownsampleConfig(
621+
round.fixedInterval(),
622+
sourceIndexSamplingMethod == null ? requestedDownsamplingMethod : sourceIndexSamplingMethod
623+
)
605624
);
606625
transportActionsDeduplicator.executeOnce(
607626
Tuple.tuple(projectId, request),
@@ -632,11 +651,12 @@ private Set<Index> evaluateDownsampleStatus(
632651
IndexMetadata.DownsampleTaskStatus downsampleStatus,
633652
DataStreamLifecycle.DownsamplingRound currentRound,
634653
DataStreamLifecycle.DownsamplingRound lastRound,
635-
Index backingIndex,
654+
DownsampleConfig.SamplingMethod downsamplingMethod,
655+
IndexMetadata backingIndex,
636656
Index downsampleIndex
637657
) {
638658
Set<Index> affectedIndices = new HashSet<>();
639-
String indexName = backingIndex.getName();
659+
String indexName = backingIndex.getIndex().getName();
640660
String downsampleIndexName = downsampleIndex.getName();
641661
return switch (downsampleStatus) {
642662
case UNKNOWN -> {
@@ -683,15 +703,15 @@ private Set<Index> evaluateDownsampleStatus(
683703
// NOTE that the downsample request is made through the deduplicator so it will only really be executed if
684704
// there isn't one already in-flight. This can happen if a previous request timed-out, failed, or there was a
685705
// master failover and data stream lifecycle needed to restart
686-
downsampleIndexOnce(currentRound, projectId, indexName, downsampleIndexName);
687-
affectedIndices.add(backingIndex);
706+
downsampleIndexOnce(currentRound, downsamplingMethod, projectId, backingIndex, downsampleIndexName);
707+
affectedIndices.add(backingIndex.getIndex());
688708
yield affectedIndices;
689709
}
690710
case SUCCESS -> {
691711
if (dataStream.getIndices().contains(downsampleIndex) == false) {
692712
// at this point the source index is part of the data stream and the downsample index is complete but not
693713
// part of the data stream. we need to replace the source index with the downsample index in the data stream
694-
affectedIndices.add(backingIndex);
714+
affectedIndices.add(backingIndex.getIndex());
695715
replaceBackingIndexWithDownsampleIndexOnce(projectId, dataStream, indexName, downsampleIndexName);
696716
}
697717
yield affectedIndices;

modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/rest/RestPutDataStreamLifecycleAction.java

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121

2222
import java.io.IOException;
2323
import java.util.List;
24+
import java.util.Set;
2425

2526
import static org.elasticsearch.rest.RestRequest.Method.PUT;
2627
import static org.elasticsearch.rest.RestUtils.getAckTimeout;
@@ -29,6 +30,9 @@
2930
@ServerlessScope(Scope.PUBLIC)
3031
public class RestPutDataStreamLifecycleAction extends BaseRestHandler {
3132

33+
private static final String SUPPORTS_DOWNSAMPLING_METHOD = "dlm.downsampling_method";
34+
private static final Set<String> CAPABILITIES = Set.of(SUPPORTS_DOWNSAMPLING_METHOD);
35+
3236
@Override
3337
public String getName() {
3438
return "put_data_lifecycles_action";
@@ -44,13 +48,14 @@ protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient cli
4448
try (XContentParser parser = request.contentParser()) {
4549
PutDataStreamLifecycleAction.Request putLifecycleRequest = PutDataStreamLifecycleAction.Request.parseRequest(
4650
parser,
47-
(dataRetention, enabled, downsampling) -> new PutDataStreamLifecycleAction.Request(
51+
(dataRetention, enabled, downsamplingRounds, downsamplingMethod) -> new PutDataStreamLifecycleAction.Request(
4852
getMasterNodeTimeout(request),
4953
getAckTimeout(request),
5054
Strings.splitStringByCommaToArray(request.param("name")),
5155
dataRetention,
5256
enabled,
53-
downsampling
57+
downsamplingRounds,
58+
downsamplingMethod
5459
)
5560
);
5661
putLifecycleRequest.indicesOptions(IndicesOptions.fromRequest(request, putLifecycleRequest.indicesOptions()));
@@ -61,4 +66,9 @@ protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient cli
6166
);
6267
}
6368
}
69+
70+
@Override
71+
public Set<String> supportedCapabilities() {
72+
return CAPABILITIES;
73+
}
6474
}

0 commit comments

Comments
 (0)