Enable a sparse doc values index for `@timestamp` in time-series indices #123191

jordan-powers · 2025-02-21T21:35:58Z

This patch builds on the work done in #122161 by also enabling the sparse doc values index for @timestamp in time-series indices.

elasticsearchmachine · 2025-02-21T21:38:00Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

martijnvg

LGTM 👍

martijnvg · 2025-02-24T09:52:43Z

server/src/main/java/org/elasticsearch/index/IndexVersions.java

    public static final IndexVersion TIMESTAMP_DOC_VALUES_SPARSE_INDEX = def(9_011_0_00, Version.LUCENE_10_1_0);
    public static final IndexVersion TIME_SERIES_ID_DOC_VALUES_SPARSE_INDEX = def(9_012_0_00, Version.LUCENE_10_1_0);
    public static final IndexVersion SYNTHETIC_SOURCE_STORE_ARRAYS_NATIVELY_KEYWORD = def(9_013_0_00, Version.LUCENE_10_1_0);
+    public static final IndexVersion TSDB_TIMESTAMP_DOC_VALUES_SPARSE_INDEX = def(9_014_0_00, Version.LUCENE_10_1_0);


I don't think we actually need an index version here, given that doc value skipper is only enabled in snapshot build. The same I think also applies to previous doc value skipper related changes. But let's keep it for now, given that we added also index version for previous doc value skipper related changes.

Yeah, I wasn't certain we needed the new IndexVersion, but I figured better safe than sorry

martijnvg · 2025-02-24T09:53:52Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

+                && indexSortConfig.hasSortOnField(fullFieldName)
+                && DataStreamTimestampFieldMapper.DEFAULT_PATH.equals(fullFieldName);
+        } else if (IndexMode.TIME_SERIES.equals(indexMode)) {
+            return indexCreatedVersion.onOrAfter(IndexVersions.TSDB_TIMESTAMP_DOC_VALUES_SPARSE_INDEX)


Maybe add a comment here why we don't check for index sorting in case for time series index mode?

Thinking more about this maybe this can be simplified to this:

return indexCreatedVersion.onOrAfter(IndexVersions.TIMESTAMP_DOC_VALUES_SPARSE_INDEX) && useDocValuesSkipper && hasDocValues && (indexMode == IndexMode.LOGSDB || indexMode == IndexMode.TIME_SERIES) && indexSortConfig != null && indexSortConfig.hasSortOnField(fullFieldName) && DataStreamTimestampFieldMapper.DEFAULT_PATH.equals(fullFieldName);

We don't have to check index sorting for tsdb, but it should aways exist and timestamp should be part of it.
This makes this check easier to read.

So then should I take out the IndexVersions.TSDB_TIMESTAMP_DOC_VALUES_SPARSE_INDEX and just use the IndexVersions.TIMESTAMP_DOC_VALUES_SPARSE_INDEX for both index modes?

Yes, I think that is fine. Since this only applies to snapshot builds and bwc isn't an issue.

…se-index

martijnvg

Based on the latest changes, I left two more comments.

martijnvg · 2025-02-25T12:48:43Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

        this.loadedTimestampRange = loadTimestampRange;
        // Also check for the existence of the timestamp field, because sometimes a segment can only contain tombstone documents,
        // which don't have any mapped fields (also not the timestamp field) and just some meta fields like _id, _seq_no etc.
-        if (loadTimestampRange && reader.getFieldInfos().fieldInfo(DataStream.TIMESTAMP_FIELD_NAME) != null) {


Maybe leave this if statement in tact and add the following check below here:

if (IndexSettings.DOC_VALUES_SKIPPER.isEnabled()) { DocValuesSkipper skipper = reader.getDocValuesSkipper(DataStream.TIMESTAMP_FIELD_NAME); assert skipper != null : "no skipper for reader:" + reader + " and parent:" + reader.getContext().parent.reader(); minTimestamp = skipper.minValue(); maxTimestamp = skipper.maxValue(); } else { PointValues tsPointValues = reader.getPointValues(DataStream.TIMESTAMP_FIELD_NAME); assert tsPointValues != null : "no timestamp field for reader:" + reader + " and parent:" + reader.getContext().parent.reader(); minTimestamp = LongPoint.decodeDimension(tsPointValues.getMinPackedValue(), 0); maxTimestamp = LongPoint.decodeDimension(tsPointValues.getMaxPackedValue(), 0); }

The reason here is that when loadTimestampRange is requested and timestamp field exists only two scenarios can be true. Either there is PointValues for timestamp field or if feature flag is enabled skipper should exist for timestamp field.

From what I can tell, there isn't an easy way to get the current IndexSettings instance into this method. I could update the constructor signature to take some extra info (either the whole IndexSettings instance or just the boolean indicating if doc values skipper is enabled), but it seems easier to just check the FieldInfo.

No need to get an IndexSettings instance, IndexSettings.DOC_VALUES_SKIPPER is a static field.

Oh I see, you're saying I should check the feature flag, not the index setting. But what if the feature flag is enabled but the skipper is disabled by the index setting? Then we still want to check the point values

Good point. Let's keep this as is then.

I actually still had to update this check. I'm keeping the same logic, still checking the FieldInfo to see if the doc values skipper is enabled, but it seems retrieving the FieldInfo even if loadTimestampRange is false was causing the CI to fail (something to do with the translogInMemorySegmentCount that I don't fully understand).

I think there are cases where we fake the reader. I think when we read from translog, for realtime get. I suspect those IndexReaders sometimes don't work well with field infos.

martijnvg · 2025-02-25T12:50:52Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

+                && indexSortConfig.hasSortOnField(fullFieldName)
+                && DataStreamTimestampFieldMapper.DEFAULT_PATH.equals(fullFieldName);
+        } else if (IndexMode.TIME_SERIES.equals(indexMode)) {
+            return indexCreatedVersion.onOrAfter(IndexVersions.TSDB_TIMESTAMP_DOC_VALUES_SPARSE_INDEX)


Thinking more about this maybe this can be simplified to this:

return indexCreatedVersion.onOrAfter(IndexVersions.TIMESTAMP_DOC_VALUES_SPARSE_INDEX) && useDocValuesSkipper && hasDocValues && (indexMode == IndexMode.LOGSDB || indexMode == IndexMode.TIME_SERIES) && indexSortConfig != null && indexSortConfig.hasSortOnField(fullFieldName) && DataStreamTimestampFieldMapper.DEFAULT_PATH.equals(fullFieldName);

We don't have to check index sorting for tsdb, but it should aways exist and timestamp should be part of it.
This makes this check easier to read.

martijnvg · 2025-02-25T18:30:26Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

+        if (loadTimestampRange && info != null) {
+            if (info.docValuesSkipIndexType() == DocValuesSkipIndexType.RANGE) {
+                DocValuesSkipper skipper = reader.getDocValuesSkipper(DataStream.TIMESTAMP_FIELD_NAME);
+                minTimestamp = skipper.minValue();


Let's add an assert here: assert skipper != null : "no skipper for reader:" + reader + " and parent:" + reader.getContext().parent.reader();

…se-index

martijnvg · 2025-02-26T19:44:15Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

    // Timeseries indices' leaf readers should be sorted by desc order of their timestamp field, as it allows search time optimizations
    public static final Comparator<LeafReader> TIMESERIES_LEAF_READERS_SORTER = Comparator.comparingLong((LeafReader r) -> {
        try {
+            FieldInfo info = r.getFieldInfos().fieldInfo(TIMESTAMP_FIELD_NAME);


martijnvg · 2025-02-26T19:47:59Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

            if (isIndexed() == false && pointsMetadataAvailable == false && hasDocValues()) {
-                // we don't have a quick way to run this check on doc values, so fall back to default assuming we are within bounds
-                return Relation.INTERSECTS;
+                if (hasDocValuesSkipper() == false) {


Did you make this change because a test failed or as an optimization? Asking because I don't expect this change to be required.

If not required, I'm doubting whether we should include this particular change now. I prefer to do this change in isolation and add the necessary tests for this.

There was a failing test, org.elasticsearch.index.shard.SearchIdleIT.testSearchIdleBoolQueryMatchOneIndex

I did add tests to DateFieldTypeTests.java to test isFieldWithinQuery when the doc values skipper is enabled. But it makes sense to me to make this change in isolation. If you want, I can break this out into a separate PR and just mute the test for now

If you want, I can break this out into a separate PR and just mute the test for now

Maybe change testSearchIdleBoolQueryMatchOneIndex(...) test to set index.mapping.use_doc_values_skipper setting to false if doc_values_skipper feature flag is enabled? Then I think the test should pass without adding this logic.

Then we can add this logic with associated tests in a follow up?

Ok, follow-up PR #123930 opened.

martijnvg · 2025-02-26T19:57:19Z

server/src/main/java/org/elasticsearch/common/lucene/uid/PerThreadIDVersionAndSeqNoLookup.java

        this.loadedTimestampRange = loadTimestampRange;
        // Also check for the existence of the timestamp field, because sometimes a segment can only contain tombstone documents,
        // which don't have any mapped fields (also not the timestamp field) and just some meta fields like _id, _seq_no etc.
-        if (loadTimestampRange && reader.getFieldInfos().fieldInfo(DataStream.TIMESTAMP_FIELD_NAME) != null) {


I think there are cases where we fake the reader. I think when we read from translog, for realtime get. I suspect those IndexReaders sometimes don't work well with field infos.

…se-index

…ipper (#123930) When running a timestamp range query, as an optimization we check if the query range overlaps with the total range of values within a shard before executing the query on that shard. That way, if the range is disjoint, we can skip execution for that shard. To get the range of values within a shard, we usually use the PointValues index on the shard. However, when the doc values skipper is enabled, the point values are not (as the reason for the skipper is to reduce storage overhead by removing the point values index). In this case, we need to instead get the range of values within the shard by using the skipper. This patch implements that logic. Follow-up to #123191

@timestamp

…ces (elastic#123191) This patch builds on the work done in elastic#122161 by also enabling the sparse doc values index for @timestamp in time-series indices.

…ipper (elastic#123930) When running a timestamp range query, as an optimization we check if the query range overlaps with the total range of values within a shard before executing the query on that shard. That way, if the range is disjoint, we can skip execution for that shard. To get the range of values within a shard, we usually use the PointValues index on the shard. However, when the doc values skipper is enabled, the point values are not (as the reason for the skipper is to reduce storage overhead by removing the point values index). In this case, we need to instead get the range of values within the shard by using the skipper. This patch implements that logic. Follow-up to elastic#123191

jordan-powers added 2 commits February 21, 2025 12:58

Test that @timestamp's DateFieldMapper is using docValuesSkipper

e251d86

Update DateFieldMapper#shouldUseDocValuesSkipper to support TSDB

d3ad4e6

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels Feb 21, 2025

jordan-powers added >non-issue :StorageEngine/TSDB You know, for Metrics and removed needs:triage Requires assignment of a team area label labels Feb 21, 2025

jordan-powers self-assigned this Feb 21, 2025

jordan-powers added the Team:StorageEngine label Feb 21, 2025

martijnvg approved these changes Feb 24, 2025

View reviewed changes

jordan-powers added 8 commits February 24, 2025 09:12

Add comment explaining tsdb sorting

9e51083

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

d1ca991

…se-index

Update hasTimestampField() check to consider doc values skipper

7cd7974

Get min/max timestamp from DocValuesSkipper if enabled

bd40507

Get max value for LeafReader sorting from docvalues skipper

ce9f792

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

b325699

…se-index

Add missing FieldInfo null check

da28201

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

0b3d514

…se-index

martijnvg requested changes Feb 25, 2025

View reviewed changes

Add non-null assertion to doc values skipper lookup

645949a

martijnvg reviewed Feb 25, 2025

View reviewed changes

jordan-powers added 3 commits February 25, 2025 10:41

Avoid FieldInfo lookup if loadTimestampRange is false

677f2f7

Simplify DateFieldMapper#shouldUseDocValuesSkipper check

26cb18c

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

c8607b2

…se-index

jordan-powers force-pushed the tsdb-timestamp-sparse-index branch from daa77ba to c8607b2 Compare February 25, 2025 18:53

jordan-powers added 4 commits February 25, 2025 14:10

Use doc values skipper for DateFieldType#isFieldWithinQuery

bc89576

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

e7cee43

…se-index

DateFieldTypeTests#isFieldWithinRangeTestCase include doc values skipper

675e175

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

3bee698

…se-index

martijnvg approved these changes Feb 26, 2025

View reviewed changes

jordan-powers added 3 commits February 27, 2025 20:34

Break out isFieldWithinQuery optimization into separate PR

752881a

Update SearchIdleIT test to disable doc values skipper

6ffac04

Merge remote-tracking branch 'upstream/main' into tsdb-timestamp-spar…

e9c619a

…se-index

jordan-powers merged commit 737ab62 into elastic:main Mar 3, 2025
17 checks passed

jordan-powers deleted the tsdb-timestamp-sparse-index branch March 3, 2025 19:29

jordan-powers mentioned this pull request Mar 3, 2025

Fix timestamp range query optimization for indices with doc values skipper #123930

Merged

Enable a sparse doc values index for @timestamp in time-series indices #123191

Enable a sparse doc values index for @timestamp in time-series indices #123191

Uh oh!

Conversation

jordan-powers commented Feb 21, 2025

Uh oh!

elasticsearchmachine commented Feb 21, 2025

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable a sparse doc values index for `@timestamp` in time-series indices #123191

Enable a sparse doc values index for `@timestamp` in time-series indices #123191