Enable a sparse doc values index for `@timestamp` in LogsDB #122161

salvatore-campagna · 2025-02-10T10:20:30Z

This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB.

Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met:

The index mode is LogsDB.
The field name is @timestamp.
Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not).
Doc values are enabled.

This ensures that only one index structure is defined on the @timestamp field:

If the conditions above are met, the inverted index is replaced with a sparse doc values index.
This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead.

This change aligns with our goal of optimizing LogsDB for storage efficiency while (possibly) maintaining reasonable query latency performance. It will enable us running benchmarks and evaluate the impact of sparse indexing on the @timestamp field too.

salvatore-campagna · 2025-02-10T11:00:51Z

server/src/main/java/org/elasticsearch/index/IndexSettings.java

        Property.Final
    );

+    public static final FeatureFlag DOC_VALUES_SKIPPER = new FeatureFlag("doc_values_skipper");


This change will not be here once we merge #121751

salvatore-campagna · 2025-02-10T11:01:06Z

server/src/main/java/org/elasticsearch/index/IndexSortConfig.java

        }
    }

+    public boolean hasSortOnFiled(final String fieldName) {


This change will not be here once we merge #121751

salvatore-campagna · 2025-02-10T11:01:54Z

server/src/main/java/org/elasticsearch/index/mapper/DataStreamTimestampFieldMapper.java


        DateFieldMapper dateFieldMapper = (DateFieldMapper) mapper;
-        if (dateFieldMapper.fieldType().isIndexed() == false) {
+        if (dateFieldMapper.fieldType().isIndexed() == false && dateFieldMapper.fieldType().hasDocValuesSkipper() == false) {


This is necessary to account for the fact that we disable the inverted index on @timestamp replacing it with a doc values sparse index for LogsDB.

salvatore-campagna · 2025-02-10T11:02:34Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

-                context.buildFullName(leafName()),
-                index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false,
+                fullFieldName,
+                hasDocValuesSkipper == false && index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false,


This is to make sure we do not have both a sparse doc values index and an inverted index.

salvatore-campagna · 2025-02-10T11:03:35Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

            DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp);
        }

+        if (hasDocValuesSkipper && hasDocValues) {


Here the doc values skipper has precedence over the index parameter only when the setting is actually enabled.

I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.

Yeah...it actually took me a bit to find out that. I was looking for the setter too. In other cases we have the Lucene FieldType and call something like setXYZ to enabled doc values or inverted index and so on passing some value to the setter method...but this one is different. I think that is done because in public static SortedNumericDocValuesField indexedField(String name, long value) a static (immutable, see call to freeze) instance of FieldType is used

static { TYPE.setDocValuesType(DocValuesType.SORTED_NUMERIC); TYPE.freeze(); INDEXED_TYPE = new FieldType(TYPE); INDEXED_TYPE.setDocValuesSkipIndexType(DocValuesSkipIndexType.RANGE); INDEXED_TYPE.freeze(); } public static SortedNumericDocValuesField indexedField(String name, long value) { return new SortedNumericDocValuesField(name, value, INDEXED_TYPE); }

having a setter would required the caller to create the type field.

Moreover a constructor with signature public SortedNumericDocValuesField(String name, long value) already exists and probably predates introduction of the new doc values sparse index.

The other one is private instead

private SortedNumericDocValuesField(String name, Long value, FieldType fieldType) { super(name, fieldType); fieldsData = value; }

In the end this is all about avoid instantiation multiple FieldType instances I guess and using a shared static one.

salvatore-campagna · 2025-02-10T11:04:11Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

            final String fullFieldName
        ) {
-            if (FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()
+            if (IndexSettings.DOC_VALUES_SKIPPER.isEnabled()


This change will not be here once we merge #121751

salvatore-campagna · 2025-02-10T11:04:22Z

server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldMapperTests.java


    public void testFieldTypeWithSkipDocValues_LogsDbMode() throws IOException {
-        assumeTrue("Needs feature flag to be enabled", FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled());
+        assumeTrue("Needs feature flag to be enabled", IndexSettings.DOC_VALUES_SKIPPER.isEnabled());


This change will not be here once we merge #121751

salvatore-campagna · 2025-02-10T11:56:47Z

server/src/main/java/org/elasticsearch/common/settings/IndexScopedSettings.java

-        IndexSettings.INDEX_MAPPER_SOURCE_MODE_SETTING,
-        IndexSettings.RECOVERY_USE_SYNTHETIC_SOURCE_SETTING,
-        InferenceMetadataFieldsMapper.USE_LEGACY_SEMANTIC_TEXT_FORMAT,
+    public static final Set<Setting<?>> BUILT_IN_INDEX_SETTINGS;


This change will not be here once we merge #121751

…ting

salvatore-campagna · 2025-02-11T14:00:41Z

test/framework/src/main/java/org/elasticsearch/cluster/metadata/DataStreamTestHelper.java

        Map<String, Object> fieldsMapping = new HashMap<>();
        fieldsMapping.put("enabled", true);
        MappingParserContext mockedParserContext = mock(MappingParserContext.class);
+        when(mockedParserContext.getIndexSettings()).thenReturn(


This is needed because in LogsDB we need to know the index creation version to determine if using the doc values sparse index.

…ting

martijnvg

I left a few comment.

martijnvg · 2025-02-12T15:23:08Z

.../src/test/java/org/elasticsearch/datastreams/mapper/DataStreamTimestampFieldMapperTests.java

+        } else {
+            assertTrue(timestampMapper.fieldType().isIndexed());
+            assertFalse(timestampMapper.fieldType().hasDocValuesSkipper());
+        }


Let's just add an assumption with feature flag here? Otherwise when we remove the feature flag, this else statement will never be invoked and I think future readers of the this test may wonder how the index setting gets set?

What do you mean with "assumption"? An assert or something else?

assumeTrue(IndexSettings.DOC_VALUES_SKIPPER.isEnabled())

martijnvg · 2025-02-12T15:28:26Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/logsdb/10_settings.yml

+          query:
+            match_all: {}
+
+  - match: { hits.total.value: 6 }


maybe add a search with a range query on @timestamp?

I am not sure what is the purpose of adding a search with a range query. The range query won't fail depending on whether there is a doc values sparse index right?

A match_all query always succeeds no matter if we have points or doc values. A range query will fail when there is no doc value and points.

martijnvg · 2025-02-12T15:40:45Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/logsdb/10_settings.yml

  - match: { test.settings.index.mode: "logsdb" }

+---
+create logs index without doc values sparse index:


Maybe also add a test that doesn't set index.mapping.use_doc_values_skipper index setting and check that @timestamp does't have points.

martijnvg · 2025-02-12T15:41:27Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/logsdb/10_settings.yml

+        index: test
+
+  - is_true: test
+  - match: { test.settings.index.mode: "logsdb" }


Maybe check with the disk analyze api whether points exists for @timestamp field

martijnvg · 2025-02-12T15:50:39Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

            DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp);
        }

+        if (hasDocValuesSkipper && hasDocValues) {


I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.

…ting

martijnvg

LGTM

martijnvg · 2025-02-17T09:28:34Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

+                indexSortConfig,
+                fullFieldName
+            );
+            boolean hasInvertedIndex = hasDocValuesSkipper == false


Suggested change

boolean hasInvertedIndex = hasDocValuesSkipper == false

boolean hasPoints = hasDocValuesSkipper == false

?

This patch extends the work done in #122161 and #121751 to also use the doc values sparse index for the _tsid fields in time-series mode indices.

@timestamp

…ces (#123191) This patch builds on the work done in #122161 by also enabling the sparse doc values index for @timestamp in time-series indices.

@timestamp

…ces (elastic#123191) This patch builds on the work done in elastic#122161 by also enabling the sparse doc values index for @timestamp in time-series indices.

feature: enable a sparse doc values index for @timestamp in LogsDB

6999ec7

salvatore-campagna self-assigned this Feb 10, 2025

salvatore-campagna added >non-issue :StorageEngine/Logs You know, for Logs labels Feb 10, 2025

elasticsearchmachine added the v9.1.0 label Feb 10, 2025

fix: constructor with hasDocValuesSkipper set to false

ffec507

salvatore-campagna commented Feb 10, 2025

View reviewed changes

fix: adapt test results to build type

035961b

salvatore-campagna added the test-release Trigger CI checks against release build label Feb 10, 2025

salvatore-campagna commented Feb 10, 2025

View reviewed changes

salvatore-campagna and others added 12 commits February 10, 2025 14:46

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

31aa6bb

…ting

fix: typo in method name

bbc84ab

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

d93d7ce

…ting

fix: rename variable to useDocValuesSkipper

a13c553

fix: gate timestamp doc values sparse idnex behind index version

d99614c

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

146d40d

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

4ab00c7

…ting

fix: gate pointsMetadataAvailable

dda2afb

fix: improve readability

0c9e1a8

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

003f1bb

…ting

test: logsdb test without doc values skipper

9a80a93

test: add two more tests

53fdd68

salvatore-campagna commented Feb 11, 2025

View reviewed changes

salvatore-campagna and others added 6 commits February 11, 2025 15:57

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

27a2bcd

…ting

fix: add missing feature flag

e62614f

fix: gate test execution

454f64f

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

07b4c35

…ting

not:rename method for consistency with host.name

20a4f5b

fix: version 8.18.1

0393874

salvatore-campagna marked this pull request as ready for review February 12, 2025 11:38

salvatore-campagna and others added 5 commits February 12, 2025 13:55

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

1e08b57

…ting

fix: checkstyle line longer thatn 140

ff02d40

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

66a116e

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

c162427

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

077d48d

…ting

martijnvg reviewed Feb 12, 2025

View reviewed changes

salvatore-campagna and others added 4 commits February 13, 2025 16:08

fix: assume on feature flag and todo

a976d54

fix: additional yaml test and range query

833a8db

test: check disk usage for points and doc values

0f4aa54

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

d1578c1

…ting

salvatore-campagna requested a review from martijnvg February 13, 2025 15:28

salvatore-campagna and others added 9 commits February 13, 2025 16:51

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

bc66ebd

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

ec662e0

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

db86770

…ting

test: move yaml tests to logsdb xpack plugin

c18febe

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

58222a0

…ting

fix: missing doc values feature flag

99dba18

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

1447c81

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

ce3b3f8

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

0d7c8ff

…ting

jordan-powers mentioned this pull request Feb 16, 2025

Doc values sparse index on _tsid fields #122699

Merged

salvatore-campagna added 2 commits February 17, 2025 09:31

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

a346eaa

…ting

Merge branch 'main' into feature/timestamp-doc-values-spars-index-set…

6e14fa1

…ting

martijnvg approved these changes Feb 17, 2025

View reviewed changes

fix: rename hasInvertedIndex to hasPoints

6634a98

salvatore-campagna merged commit 780cac5 into elastic:main Feb 17, 2025
18 checks passed

jordan-powers added a commit that referenced this pull request Feb 19, 2025

Doc values sparse index on _tsid fields (#122699)

bf31ee6

This patch extends the work done in #122161 and #121751 to also use the doc values sparse index for the _tsid fields in time-series mode indices.

jordan-powers mentioned this pull request Feb 21, 2025

Enable a sparse doc values index for @timestamp in time-series indices #123191

Merged

	boolean hasInvertedIndex = hasDocValuesSkipper == false
	boolean hasPoints = hasDocValuesSkipper == false

Enable a sparse doc values index for @timestamp in LogsDB #122161

Enable a sparse doc values index for @timestamp in LogsDB #122161

Uh oh!

Conversation

salvatore-campagna commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable a sparse doc values index for `@timestamp` in LogsDB #122161

Enable a sparse doc values index for `@timestamp` in LogsDB #122161

salvatore-campagna commented Feb 10, 2025 •

edited

Loading

salvatore-campagna Feb 12, 2025 •

edited

Loading