-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[8.x] Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889) #122276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
elasticsearchmachine
merged 4 commits into
elastic:8.x
from
craigtaverner:8.x-st_extent_optimize_geoshape
Feb 12, 2025
Merged
[8.x] Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889) #122276
elasticsearchmachine
merged 4 commits into
elastic:8.x
from
craigtaverner:8.x-st_extent_optimize_geoshape
Feb 12, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
💔 Backport failed
You can use sqren/backport to manually backport by running |
craigtaverner
added a commit
to craigtaverner/elasticsearch
that referenced
this pull request
Feb 12, 2025
…ic#119889) (elastic#122276) * Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (elastic#119889) Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now. * Regenerate generated files
Backport to 8.18 done manually in #122420 |
elasticsearchmachine
pushed a commit
that referenced
this pull request
Feb 13, 2025
…) (#122276) (#122420) * Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889) Support for `ST_EXTENT_AGG` was added in #118829, and then partially optimized in #118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation. Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by: * Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape * Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs) * Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found * Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency) * Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless) * Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency) Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now. * Regenerate generated files
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/ES|QL
AKA ESQL
:Analytics/Geo
Indexing, search aggregations of geo points and shapes
auto-backport
Automatically create backport pull requests when merged
auto-merge-without-approval
Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!)
backport
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
v8.19.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual backport of #119889