Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Jun 7, 2025

This PR reverts #128531.

With #128531, the Limit operator was updated to combine smaller pages into a larger page to reduce overhead, such as the number of exchange requests. However, this has a significant implication: the combined larger page does not retain the attributes of the blocks from the smaller pages. For example, if the smaller pages have ordinal-based BytesRef blocks, the larger page will not. This can cause a significant slowdown if subsequent operators have optimizations for ordinal-based blocks. The Enrich operator has such optimizations, and our benchmarks have shown this performance regression.

One possible solution to reduce the regression is to set a threshold (e.g., 1000 rows), above which the Limit operator would pass the page along without combining. However, even with a threshold of 1000, the performance regression does not go away completely. Alternatively, we could allow exchange requests to return multiple pages (up to the page size limit). To minimize risk, this PR reverts the previous change, and we will reintroduce a new change later

@dnhatn dnhatn changed the title Revert Combine small pages in Limit Revert "Combine small pages in Limit" Jun 7, 2025
@dnhatn dnhatn marked this pull request as ready for review June 8, 2025 21:56
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Jun 9, 2025

This can cause a significant slowdown if subsequent operators have optimizations for ordinal-based blocks.

I wrote some stuff that uses those too! I should have thought about that. Good thing we have the benchmarks.

@dnhatn
Copy link
Member Author

dnhatn commented Jun 9, 2025

Thanks for reviewing, Ivan and Nik!

@dnhatn dnhatn merged commit 0cac912 into elastic:main Jun 9, 2025
18 checks passed
@dnhatn dnhatn deleted the revert-merge-limit branch June 9, 2025 15:24
benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Jun 9, 2025
This PR reverts elastic#128531.

With elastic#128531, the Limit operator was updated to combine smaller pages 
into a larger page to reduce overhead, such as the number of exchange 
requests. However, this has a significant implication: the combined
larger page does not retain the attributes of the blocks from the
smaller pages. For example, if the smaller pages have ordinal-based
BytesRef blocks, the larger page will not. This can cause a significant
slowdown if subsequent operators have optimizations for ordinal-based
blocks. The Enrich operator has such optimizations, and our benchmarks
have shown this performance regression.

One possible solution to reduce the regression is to set a threshold 
(e.g., 1000 rows), above which the Limit operator would pass the page
along without combining. However, even with a threshold of 1000, the
performance regression does not go away completely. Alternatively, we
could allow exchange requests to return multiple pages (up to the page
size limit). To minimize risk, this PR reverts the previous change, and
we will reintroduce a new change later
valeriy42 pushed a commit to valeriy42/elasticsearch that referenced this pull request Jun 12, 2025
This PR reverts elastic#128531.

With elastic#128531, the Limit operator was updated to combine smaller pages 
into a larger page to reduce overhead, such as the number of exchange 
requests. However, this has a significant implication: the combined
larger page does not retain the attributes of the blocks from the
smaller pages. For example, if the smaller pages have ordinal-based
BytesRef blocks, the larger page will not. This can cause a significant
slowdown if subsequent operators have optimizations for ordinal-based
blocks. The Enrich operator has such optimizations, and our benchmarks
have shown this performance regression.

One possible solution to reduce the regression is to set a threshold 
(e.g., 1000 rows), above which the Limit operator would pass the page
along without combining. However, even with a threshold of 1000, the
performance regression does not go away completely. Alternatively, we
could allow exchange requests to return multiple pages (up to the page
size limit). To minimize risk, this PR reverts the previous change, and
we will reintroduce a new change later
dnhatn added a commit that referenced this pull request Aug 29, 2025
This test failure was fixed in #129107

Closes #128721
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants