Skip to content

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Feb 10, 2025

This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB.

Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met:

  • The index mode is LogsDB.
  • The field name is @timestamp.
  • Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not).
  • Doc values are enabled.

This ensures that only one index structure is defined on the @timestamp field:

  • If the conditions above are met, the inverted index is replaced with a sparse doc values index.
  • This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead.

This change aligns with our goal of optimizing LogsDB for storage efficiency while (possibly) maintaining reasonable query latency performance. It will enable us running benchmarks and evaluate the impact of sparse indexing on the @timestamp field too.

Property.Final
);

public static final FeatureFlag DOC_VALUES_SKIPPER = new FeatureFlag("doc_values_skipper");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will not be here once we merge #121751

}
}

public boolean hasSortOnFiled(final String fieldName) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will not be here once we merge #121751


DateFieldMapper dateFieldMapper = (DateFieldMapper) mapper;
if (dateFieldMapper.fieldType().isIndexed() == false) {
if (dateFieldMapper.fieldType().isIndexed() == false && dateFieldMapper.fieldType().hasDocValuesSkipper() == false) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to account for the fact that we disable the inverted index on @timestamp replacing it with a doc values sparse index for LogsDB.

context.buildFullName(leafName()),
index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false,
fullFieldName,
hasDocValuesSkipper == false && index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure we do not have both a sparse doc values index and an inverted index.

DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp);
}

if (hasDocValuesSkipper && hasDocValues) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the doc values skipper has precedence over the index parameter only when the setting is actually enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...it actually took me a bit to find out that. I was looking for the setter too. In other cases we have the Lucene FieldType and call something like setXYZ to enabled doc values or inverted index and so on passing some value to the setter method...but this one is different. I think that is done because in public static SortedNumericDocValuesField indexedField(String name, long value) a static (immutable, see call to freeze) instance of FieldType is used

static {
    TYPE.setDocValuesType(DocValuesType.SORTED_NUMERIC);
    TYPE.freeze();

    INDEXED_TYPE = new FieldType(TYPE);
    INDEXED_TYPE.setDocValuesSkipIndexType(DocValuesSkipIndexType.RANGE);
    INDEXED_TYPE.freeze();
  }

  public static SortedNumericDocValuesField indexedField(String name, long value) {
    return new SortedNumericDocValuesField(name, value, INDEXED_TYPE);
  }

having a setter would required the caller to create the type field.

Moreover a constructor with signature public SortedNumericDocValuesField(String name, long value) already exists and probably predates introduction of the new doc values sparse index.

The other one is private instead

private SortedNumericDocValuesField(String name, Long value, FieldType fieldType) {
    super(name, fieldType);
    fieldsData = value;
  }

In the end this is all about avoid instantiation multiple FieldType instances I guess and using a shared static one.

final String fullFieldName
) {
if (FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()
if (IndexSettings.DOC_VALUES_SKIPPER.isEnabled()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will not be here once we merge #121751


public void testFieldTypeWithSkipDocValues_LogsDbMode() throws IOException {
assumeTrue("Needs feature flag to be enabled", FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled());
assumeTrue("Needs feature flag to be enabled", IndexSettings.DOC_VALUES_SKIPPER.isEnabled());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will not be here once we merge #121751

@salvatore-campagna salvatore-campagna added the test-release Trigger CI checks against release build label Feb 10, 2025
IndexSettings.INDEX_MAPPER_SOURCE_MODE_SETTING,
IndexSettings.RECOVERY_USE_SYNTHETIC_SOURCE_SETTING,
InferenceMetadataFieldsMapper.USE_LEGACY_SEMANTIC_TEXT_FORMAT,
public static final Set<Setting<?>> BUILT_IN_INDEX_SETTINGS;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will not be here once we merge #121751

Map<String, Object> fieldsMapping = new HashMap<>();
fieldsMapping.put("enabled", true);
MappingParserContext mockedParserContext = mock(MappingParserContext.class);
when(mockedParserContext.getIndexSettings()).thenReturn(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because in LogsDB we need to know the index creation version to determine if using the doc values sparse index.

@salvatore-campagna salvatore-campagna marked this pull request as ready for review February 12, 2025 11:38
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comment.

Comment on lines 276 to 279
} else {
assertTrue(timestampMapper.fieldType().isIndexed());
assertFalse(timestampMapper.fieldType().hasDocValuesSkipper());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add an assumption with feature flag here? Otherwise when we remove the feature flag, this else statement will never be invoked and I think future readers of the this test may wonder how the index setting gets set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "assumption"? An assert or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assumeTrue(IndexSettings.DOC_VALUES_SKIPPER.isEnabled())

query:
match_all: {}

- match: { hits.total.value: 6 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a search with a range query on @timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the purpose of adding a search with a range query. The range query won't fail depending on whether there is a doc values sparse index right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A match_all query always succeeds no matter if we have points or doc values. A range query will fail when there is no doc value and points.

- match: { test.settings.index.mode: "logsdb" }

---
create logs index without doc values sparse index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a test that doesn't set index.mapping.use_doc_values_skipper index setting and check that @timestamp does't have points.

index: test

- is_true: test
- match: { test.settings.index.mode: "logsdb" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check with the disk analyze api whether points exists for @timestamp field

DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp);
}

if (hasDocValuesSkipper && hasDocValues) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

indexSortConfig,
fullFieldName
);
boolean hasInvertedIndex = hasDocValuesSkipper == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean hasInvertedIndex = hasDocValuesSkipper == false
boolean hasPoints = hasDocValuesSkipper == false

?

@salvatore-campagna salvatore-campagna merged commit 780cac5 into elastic:main Feb 17, 2025
18 checks passed
jordan-powers added a commit that referenced this pull request Feb 19, 2025
This patch extends the work done in #122161 and #121751 to also use the doc values sparse index for the _tsid fields in time-series mode indices.
jordan-powers added a commit that referenced this pull request Mar 3, 2025
…ces (#123191)

This patch builds on the work done in #122161 by also enabling the sparse doc values index for @timestamp in time-series indices.
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
…ces (elastic#123191)

This patch builds on the work done in elastic#122161 by also enabling the sparse doc values index for @timestamp in time-series indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/Logs You know, for Logs Team:StorageEngine test-release Trigger CI checks against release build v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants