Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented May 27, 2025

Reverted in #129107

Currently, the Limit operator does not combine small pages into larger ones; it simply passes them along, except for chunking pages larger than the limit. This change integrates EstimatesRowSize into Limit and adjusts it to emit larger pages. As a result, pages up to twice the pageSize may be emitted, which is preferable to emitting undersized pages. This should reduce the number of transport requests and responses between clusters or coordinator-data nodes for queries without TopN or STATS when target shards produce small pages due to their size or highly selective filters.

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn requested review from idegtiarenko and nik9000 May 27, 2025 21:34
@dnhatn dnhatn marked this pull request as ready for review May 27, 2025 21:35
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 27, 2025
OperatorStatus exchangeSink = driverProfile.operators().get(2);
assertThat(exchangeSink.status(), instanceOf(ExchangeSinkOperator.Status.class));
ExchangeSinkOperator.Status exchangeStatus = (ExchangeSinkOperator.Status) exchangeSink.status();
assertThat(exchangeStatus.pagesReceived(), lessThanOrEqualTo(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this to be strictly equalTo(1). When this could be 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be 0 if early termination kicks in.

@dnhatn dnhatn added the auto-backport Automatically create backport pull requests when merged label May 28, 2025

public LimitOperator(Limiter limiter) {
private final int pageSize;
private int pendingRows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this one below the final ones? I just want to keep the mutable ones not mixed in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I regrouped these in c94c72c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dnhatn
Copy link
Member Author

dnhatn commented May 29, 2025

@idegtiarenko @nik9000 Thanks!

@dnhatn dnhatn merged commit 1ab2e6c into elastic:main May 29, 2025
18 checks passed
@dnhatn dnhatn deleted the merge-pages-limit branch May 29, 2025 23:04
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128531

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request May 30, 2025
Currently, the Limit operator does not combine small pages into larger 
ones; it simply passes them along, except for chunking pages larger than
the limit. This change integrates EstimatesRowSize into Limit and
adjusts it to emit larger pages. As a result, pages up to twice the
pageSize may be emitted, which is preferable to emitting undersized
pages. This should reduce the number of transport requests and responses
between clusters or coordinator-data nodes for queries without TopN or
STATS when target shards produce small pages due to their size or highly
selective filters.
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
Currently, the Limit operator does not combine small pages into larger 
ones; it simply passes them along, except for chunking pages larger than
the limit. This change integrates EstimatesRowSize into Limit and
adjusts it to emit larger pages. As a result, pages up to twice the
pageSize may be emitted, which is preferable to emitting undersized
pages. This should reduce the number of transport requests and responses
between clusters or coordinator-data nodes for queries without TopN or
STATS when target shards produce small pages due to their size or highly
selective filters.
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
Currently, the Limit operator does not combine small pages into larger 
ones; it simply passes them along, except for chunking pages larger than
the limit. This change integrates EstimatesRowSize into Limit and
adjusts it to emit larger pages. As a result, pages up to twice the
pageSize may be emitted, which is preferable to emitting undersized
pages. This should reduce the number of transport requests and responses
between clusters or coordinator-data nodes for queries without TopN or
STATS when target shards produce small pages due to their size or highly
selective filters.
dnhatn added a commit that referenced this pull request Jun 9, 2025
This PR reverts #128531.

With #128531, the Limit operator was updated to combine smaller pages 
into a larger page to reduce overhead, such as the number of exchange 
requests. However, this has a significant implication: the combined
larger page does not retain the attributes of the blocks from the
smaller pages. For example, if the smaller pages have ordinal-based
BytesRef blocks, the larger page will not. This can cause a significant
slowdown if subsequent operators have optimizations for ordinal-based
blocks. The Enrich operator has such optimizations, and our benchmarks
have shown this performance regression.

One possible solution to reduce the regression is to set a threshold 
(e.g., 1000 rows), above which the Limit operator would pass the page
along without combining. However, even with a threshold of 1000, the
performance regression does not go away completely. Alternatively, we
could allow exchange requests to return multiple pages (up to the page
size limit). To minimize risk, this PR reverts the previous change, and
we will reintroduce a new change later
@dnhatn dnhatn added >non-issue and removed >enhancement backport pending v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 9, 2025
@dnhatn
Copy link
Member Author

dnhatn commented Jun 9, 2025

Reverted in #129107

benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Jun 9, 2025
This PR reverts elastic#128531.

With elastic#128531, the Limit operator was updated to combine smaller pages 
into a larger page to reduce overhead, such as the number of exchange 
requests. However, this has a significant implication: the combined
larger page does not retain the attributes of the blocks from the
smaller pages. For example, if the smaller pages have ordinal-based
BytesRef blocks, the larger page will not. This can cause a significant
slowdown if subsequent operators have optimizations for ordinal-based
blocks. The Enrich operator has such optimizations, and our benchmarks
have shown this performance regression.

One possible solution to reduce the regression is to set a threshold 
(e.g., 1000 rows), above which the Limit operator would pass the page
along without combining. However, even with a threshold of 1000, the
performance regression does not go away completely. Alternatively, we
could allow exchange requests to return multiple pages (up to the page
size limit). To minimize risk, this PR reverts the previous change, and
we will reintroduce a new change later
valeriy42 pushed a commit to valeriy42/elasticsearch that referenced this pull request Jun 12, 2025
This PR reverts elastic#128531.

With elastic#128531, the Limit operator was updated to combine smaller pages 
into a larger page to reduce overhead, such as the number of exchange 
requests. However, this has a significant implication: the combined
larger page does not retain the attributes of the blocks from the
smaller pages. For example, if the smaller pages have ordinal-based
BytesRef blocks, the larger page will not. This can cause a significant
slowdown if subsequent operators have optimizations for ordinal-based
blocks. The Enrich operator has such optimizations, and our benchmarks
have shown this performance regression.

One possible solution to reduce the regression is to set a threshold 
(e.g., 1000 rows), above which the Limit operator would pass the page
along without combining. However, even with a threshold of 1000, the
performance regression does not go away completely. Alternatively, we
could allow exchange requests to return multiple pages (up to the page
size limit). To minimize risk, this PR reverts the previous change, and
we will reintroduce a new change later
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants