-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable a sparse doc values index for @timestamp in LogsDB
#122161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable a sparse doc values index for @timestamp in LogsDB
#122161
Conversation
| Property.Final | ||
| ); | ||
|
|
||
| public static final FeatureFlag DOC_VALUES_SKIPPER = new FeatureFlag("doc_values_skipper"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will not be here once we merge #121751
| } | ||
| } | ||
|
|
||
| public boolean hasSortOnFiled(final String fieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will not be here once we merge #121751
|
|
||
| DateFieldMapper dateFieldMapper = (DateFieldMapper) mapper; | ||
| if (dateFieldMapper.fieldType().isIndexed() == false) { | ||
| if (dateFieldMapper.fieldType().isIndexed() == false && dateFieldMapper.fieldType().hasDocValuesSkipper() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to account for the fact that we disable the inverted index on @timestamp replacing it with a doc values sparse index for LogsDB.
| context.buildFullName(leafName()), | ||
| index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false, | ||
| fullFieldName, | ||
| hasDocValuesSkipper == false && index.getValue() && indexCreatedVersion.isLegacyIndexVersion() == false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make sure we do not have both a sparse doc values index and an inverted index.
| DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp); | ||
| } | ||
|
|
||
| if (hasDocValuesSkipper && hasDocValues) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the doc values skipper has precedence over the index parameter only when the setting is actually enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...it actually took me a bit to find out that. I was looking for the setter too. In other cases we have the Lucene FieldType and call something like setXYZ to enabled doc values or inverted index and so on passing some value to the setter method...but this one is different. I think that is done because in public static SortedNumericDocValuesField indexedField(String name, long value) a static (immutable, see call to freeze) instance of FieldType is used
static {
TYPE.setDocValuesType(DocValuesType.SORTED_NUMERIC);
TYPE.freeze();
INDEXED_TYPE = new FieldType(TYPE);
INDEXED_TYPE.setDocValuesSkipIndexType(DocValuesSkipIndexType.RANGE);
INDEXED_TYPE.freeze();
}
public static SortedNumericDocValuesField indexedField(String name, long value) {
return new SortedNumericDocValuesField(name, value, INDEXED_TYPE);
}
having a setter would required the caller to create the type field.
Moreover a constructor with signature public SortedNumericDocValuesField(String name, long value) already exists and probably predates introduction of the new doc values sparse index.
The other one is private instead
private SortedNumericDocValuesField(String name, Long value, FieldType fieldType) {
super(name, fieldType);
fieldsData = value;
}
In the end this is all about avoid instantiation multiple FieldType instances I guess and using a shared static one.
| final String fullFieldName | ||
| ) { | ||
| if (FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled() | ||
| if (IndexSettings.DOC_VALUES_SKIPPER.isEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will not be here once we merge #121751
|
|
||
| public void testFieldTypeWithSkipDocValues_LogsDbMode() throws IOException { | ||
| assumeTrue("Needs feature flag to be enabled", FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()); | ||
| assumeTrue("Needs feature flag to be enabled", IndexSettings.DOC_VALUES_SKIPPER.isEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will not be here once we merge #121751
| IndexSettings.INDEX_MAPPER_SOURCE_MODE_SETTING, | ||
| IndexSettings.RECOVERY_USE_SYNTHETIC_SOURCE_SETTING, | ||
| InferenceMetadataFieldsMapper.USE_LEGACY_SEMANTIC_TEXT_FORMAT, | ||
| public static final Set<Setting<?>> BUILT_IN_INDEX_SETTINGS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will not be here once we merge #121751
| Map<String, Object> fieldsMapping = new HashMap<>(); | ||
| fieldsMapping.put("enabled", true); | ||
| MappingParserContext mockedParserContext = mock(MappingParserContext.class); | ||
| when(mockedParserContext.getIndexSettings()).thenReturn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because in LogsDB we need to know the index creation version to determine if using the doc values sparse index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comment.
| } else { | ||
| assertTrue(timestampMapper.fieldType().isIndexed()); | ||
| assertFalse(timestampMapper.fieldType().hasDocValuesSkipper()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just add an assumption with feature flag here? Otherwise when we remove the feature flag, this else statement will never be invoked and I think future readers of the this test may wonder how the index setting gets set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with "assumption"? An assert or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumeTrue(IndexSettings.DOC_VALUES_SKIPPER.isEnabled())
| query: | ||
| match_all: {} | ||
|
|
||
| - match: { hits.total.value: 6 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a search with a range query on @timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the purpose of adding a search with a range query. The range query won't fail depending on whether there is a doc values sparse index right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A match_all query always succeeds no matter if we have points or doc values. A range query will fail when there is no doc value and points.
| - match: { test.settings.index.mode: "logsdb" } | ||
|
|
||
| --- | ||
| create logs index without doc values sparse index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add a test that doesn't set index.mapping.use_doc_values_skipper index setting and check that @timestamp does't have points.
| index: test | ||
|
|
||
| - is_true: test | ||
| - match: { test.settings.index.mode: "logsdb" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check with the disk analyze api whether points exists for @timestamp field
| DataStreamTimestampFieldMapper.storeTimestampValueForReuse(context.doc(), timestamp); | ||
| } | ||
|
|
||
| if (hasDocValuesSkipper && hasDocValues) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking where setDocValuesType(...) gets invoked, but I see that SortedNumericDocValuesField.indexedField(...) adding doc value with doc value skipper enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| indexSortConfig, | ||
| fullFieldName | ||
| ); | ||
| boolean hasInvertedIndex = hasDocValuesSkipper == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| boolean hasInvertedIndex = hasDocValuesSkipper == false | |
| boolean hasPoints = hasDocValuesSkipper == false |
?
…ces (#123191) This patch builds on the work done in #122161 by also enabling the sparse doc values index for @timestamp in time-series indices.
…ces (elastic#123191) This patch builds on the work done in elastic#122161 by also enabling the sparse doc values index for @timestamp in time-series indices.
This PR extends the work done in #121751 by enabling a sparse doc values index for the
@timestampfield in LogsDB.Similar to the previous PR, the setting
index.mapping.use_doc_values_skipperwill override theindexmapping parameter when all of the following conditions are met:@timestamp.@timestamp(regardless of whether it is a primary sort field or not).This ensures that only one index structure is defined on the
@timestampfield:This change aligns with our goal of optimizing LogsDB for storage efficiency while (possibly) maintaining reasonable query latency performance. It will enable us running benchmarks and evaluate the impact of sparse indexing on the
@timestampfield too.