Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 26, 2025

For each pipeline in rate aggregation, we use 3 drivers and execute one shard at a time; therefore, the maximum parallelism per query on data nodes is 3. Unlike non-rate aggregations, we cannot partition shards into multiple slices for concurrent execution, which is one of several limitations of rate aggregation. These changes adjust the data node executor to run multiple pipelines simultaneously, with each pipeline handling a single shard to increase parallelism when there are multiple target shards on data nodes.

This change is expected to improve benchmark by 3 times.

@dnhatn dnhatn added >non-issue :StorageEngine/TSDB You know, for Metrics :Analytics/Compute Engine Analytics in ES|QL labels Aug 26, 2025
@dnhatn dnhatn marked this pull request as ready for review August 26, 2025 23:43
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Nhat!

@dnhatn
Copy link
Member Author

dnhatn commented Aug 27, 2025

Thanks Martijn!

@dnhatn dnhatn merged commit 186aaa4 into elastic:main Aug 27, 2025
33 checks passed
@dnhatn dnhatn deleted the rate-excution branch August 27, 2025 05:53
@kkrik-es
Copy link
Contributor

LGTM, looking fwd to see the impact.

if (p.operators().stream().anyMatch(s -> s.status() instanceof TimeSeriesSourceOperator.Status)) {
assertThat(p.operators(), hasSize(2));
TimeSeriesSourceOperator.Status status = (TimeSeriesSourceOperator.Status) p.operators().get(0).status();
assertThat(status.processedShards(), hasSize(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a failure in my PR:

REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TimeSeriesIT.testProfile" -Dtests.seed=ADBEEDF1C962A1C8 -Dtests.locale=nso -Dtests.timezone=Europe/Samara -Druntime.java=24
--
  | 2> java.lang.AssertionError:
  | Expected: a collection with size <1>
  | but: collection size was <0>
  | at __randomizedtesting.SeedInfo.seed([ADBEEDF1C962A1C8:FCE458746D2F19E1]:0)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  | at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2706)
  | at org.elasticsearch.xpack.esql.action.TimeSeriesIT.testProfile(TimeSeriesIT.java:556)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kostas! I will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Compute Engine Analytics in ES|QL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants