-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable skippers on @timestamp
and host.name
fields in LogsDB
#134221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable skippers on @timestamp
and host.name
fields in LogsDB
#134221
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @romseygeek, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alan, LGTM!
I think doc value skippers will now also be enabled for _tsid
field? (in snapshot versions) (If I understand TimeSeriesIdFieldMapper#getInstance(...)
correctly)
docs/changelog/134221.yaml
Outdated
@@ -0,0 +1,5 @@ | |||
pr: 134221 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's label this as a non-issue? Because released versions will not run with doc value skippers enabled And when the feature flag gets removed then we label it as a feature or enhancement.
Yes, that looks right. It doesn't remove the backing index for this field though, it only enables the skipper. |
Yes, the |
…ers' into benchmark/main-enabled-skippers
assertThat(dataProfiles, hasSize(1)); | ||
List<OperatorStatus> ops = dataProfiles.get(0).operators(); | ||
assertThat(ops, hasSize(5)); | ||
assertThat(ops, hasSize(6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change adds an extra ValuesSourceReaderOperator step into this ESQL query - does that sound right to you @martijnvg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know for what field(s) both value source reader operations are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new one is for @timestamp
:
{"operator":"ValuesSourceReaderOperator[fields = [@timestamp]]","status":{"readers_built":{"@timestamp:column_at_a_time:BlockDocValuesReader.SingletonLongs":3},"values_loaded":98,"process_nanos":887208,"pages_received":3,"pages_emitted":3,"rows_received":98,"rows_emitted":98}}
The other is for _tsid
, cpu
and cluster
and is unchanged from before:
{"operator":"ValuesSourceReaderOperator[fields = [_tsid, cpu, cluster]]","status":{"readers_built":{"_tsid:column_at_a_time:BlockDocValuesReader.SingletonOrdinals":3,"cluster:column_at_a_time:BlockDocValuesReader.SingletonOrdinals":3,"cpu:column_at_a_time:BlockDocValuesReader.SingletonDoubles":3},"values_loaded":294,"process_nanos":1226959,"pages_received":3,"pages_emitted":3,"rows_received":98,"rows_emitted":98}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check whether the LuceneSourceOperator pushes down the buckets as filters? For example do see IndexOrDocValuesQuery(indexQuery=@timestamp:[1713139320000 TO 1713139379999], dvQuery=@timestamp:[1713139320000 TO 1713139379999])
in processedQueries of the status of LuceneSourceOperator?
Maybe this change results in compute engine no longer pushing down filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, the field caps api for @timestamp
field will no longer return that it is indexed and that is why I think the query plan is different, with this as a result.
I don't think this blocks merging this pr. Maybe we need to think about marking a field as indexed if it has doc value skippers and is part of index sorting before we release the feature flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked LuceneSourceOperator
with this change and there aren't any IndexOrDocValuesQuery
instances in processed query.
So I think this is good. Let's merge it when CI is green.
…ers' into benchmark/main-enabled-skippers
…ers' into benchmark/main-enabled-skippers
…ers' into benchmark/main-enabled-skippers
…ers' into benchmark/main-enabled-skippers
Buildkite benchmark this with tsdb please |
1 similar comment
Buildkite benchmark this with tsdb please |
@romseygeek As both baseline and contender failed in https://buildkite.com/elastic/elasticsearch-pull-request-performance-benchmark/builds/33, and also nightly failed for |
@gbanasiak thanks, I was checking in the wrong place for errors and I thought it was a transient but it looks as though there's a problem with aggregate metric doubles. I'll dig further. |
Buildkite benchmark this with tsdb please |
I am confused... I kicked off that last benchmark at 11:33, but following the |
Buildkite benchmark this with tsdb please |
Trying a benchmark again after I've merged in |
💚 Build Succeeded
This build ran two tsdb benchmarks to evaluate performance impact of this PR. Historycc @romseygeek |
This was enabled and hidden behind a feature flag a while ago, but then disabled when
performance suffered. Recent benchmarks show that performance is likely to be back
on par with changes made in lucene in the intervening time.
This commit re-enables the skippers for the
@timestamp
andhost.name
fields inlogsdb, keeping everything gated behind the feature flag.