Skip to content

Conversation

craigtaverner
Copy link
Contributor

Manual backport of #119889

)

Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation.

Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by:
* Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape
* Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs)
* Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found
* Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency)
* Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless)
* Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency)

Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
@craigtaverner craigtaverner added :Analytics/Geo Indexing, search aggregations of geo points and shapes backport Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.18.1 v8.19.0 labels Feb 11, 2025
@craigtaverner craigtaverner requested review from GalLalouche and ivancea and removed request for GalLalouche February 11, 2025 16:27
@craigtaverner craigtaverner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 12, 2025
@elasticsearchmachine elasticsearchmachine merged commit 7bf974c into elastic:8.x Feb 12, 2025
15 checks passed
@craigtaverner craigtaverner deleted the 8.x-st_extent_optimize_geoshape branch February 12, 2025 12:44
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122276

craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Feb 12, 2025
…ic#119889) (elastic#122276)

* Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (elastic#119889)

Support for `ST_EXTENT_AGG` was added in elastic#118829, and then partially optimized in elastic#118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation.

Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by:
* Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape
* Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs)
* Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found
* Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency)
* Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless)
* Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency)

Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.

* Regenerate generated files
@craigtaverner
Copy link
Contributor Author

Backport to 8.18 done manually in #122420

elasticsearchmachine pushed a commit that referenced this pull request Feb 13, 2025
…) (#122276) (#122420)

* Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889)

Support for `ST_EXTENT_AGG` was added in #118829, and then partially optimized in #118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation.

Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by:
* Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape
* Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs)
* Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found
* Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency)
* Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless)
* Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency)

Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.

* Regenerate generated files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants