Skip to content

feat(rust/sedona-spatial-join): Add config to disable spatial join reordering#733

Open
2010YOUY01 wants to merge 1 commit intoapache:mainfrom
2010YOUY01:config-spatial-join-reordering
Open

feat(rust/sedona-spatial-join): Add config to disable spatial join reordering#733
2010YOUY01 wants to merge 1 commit intoapache:mainfrom
2010YOUY01:config-spatial-join-reordering

Conversation

@2010YOUY01
Copy link
Contributor

Motivation

Heuristics-based join reordering can fail. Providing an option to disable it allows manual control over spatial join order—the execution order will match the query order.

This configuration only affects spatial joins, not regular joins, for greater flexibility.

Demo

In sedona-cli

> explain select t1.name from '/Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet' as t1
join '/Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet' as t2
on st_intersects(t1.geometry, t2.geometry);
┌───────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   plan_type   ┆                                                                                           plan                                                                                          │
│      utf8     ┆                                                                                           utf8                                                                                          │
╞═══════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ logical_plan  ┆ Projection: t1.name                                                                                                                                                                     │
│               ┆   SpatialJoin: join_type=Inner, filter=st_intersects(t1.geometry, t2.geometry)                                                                                                          │
│               ┆     SubqueryAlias: t1                                                                                                                                                                   │
│               ┆       TableScan: /Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet projection=[name, geometry]                               │
│               ┆     SubqueryAlias: t2                                                                                                                                                                   │
│               ┆       TableScan: /Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet projection=[geometry]                                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ physical_plan ┆ SpatialJoinExec: join_type=Inner, on=ST_intersects(geometry@0, geometry@1), projection=[name@1]                                                                                         │
│               ┆   DataSourceExec: file_groups={1 group: [[Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet]]}, projection=[geometry], fil │
│               ┆ e_type=parquet                                                                                                                                                                          │
│               ┆   ProbeShuffleExec: partitioning=RoundRobinBatch(1)                                                                                                                                     │
│               ┆     DataSourceExec: file_groups={1 group: [[Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet]]}, projection=[name, geometry] │
│               ┆ , file_type=parquet                                                                                                                                                                     │
│               ┆                                                                                                                                                                                         │
└───────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 row(s)/2 column(s) fetched.
Elapsed 0.005 seconds.

> SET sedona.spatial_join.spatial_join_reordering = false;

0 row(s)/0 column(s) fetched.
Elapsed 0.003 seconds.

> explain select t1.name from '/Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet' as t1
join '/Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet' as t2
on st_intersects(t1.geometry, t2.geometry);
┌───────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   plan_type   ┆                                                                                           plan                                                                                          │
│      utf8     ┆                                                                                           utf8                                                                                          │
╞═══════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ logical_plan  ┆ Projection: t1.name                                                                                                                                                                     │
│               ┆   SpatialJoin: join_type=Inner, filter=st_intersects(t1.geometry, t2.geometry)                                                                                                          │
│               ┆     SubqueryAlias: t1                                                                                                                                                                   │
│               ┆       TableScan: /Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet projection=[name, geometry]                               │
│               ┆     SubqueryAlias: t2                                                                                                                                                                   │
│               ┆       TableScan: /Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet projection=[geometry]                                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ physical_plan ┆ SpatialJoinExec: join_type=Inner, on=ST_intersects(geometry@1, geometry@0), projection=[name@0]                                                                                         │
│               ┆   DataSourceExec: file_groups={1 group: [[Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_cities_geo.parquet]]}, projection=[name, geometry],  │
│               ┆ file_type=parquet                                                                                                                                                                       │
│               ┆   ProbeShuffleExec: partitioning=RoundRobinBatch(1)                                                                                                                                     │
│               ┆     DataSourceExec: file_groups={1 group: [[Users/yongting/Code/sedona-db/submodules/geoarrow-data/natural-earth/files/natural-earth_countries_geo.parquet]]}, projection=[geometry], f │
│               ┆ ile_type=parquet                                                                                                                                                                        │
│               ┆                                                                                                                                                                                         │
└───────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2 row(s)/2 column(s) fetched.
Elapsed 0.005 seconds.

/// 3. Do not swap the join order if join reordering is disabled or no relevant
/// statistics are available.
fn should_swap_join_order(
spatial_join_options: &SpatialJoinOptions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SpatialJoinOptions arg instead of a boolean flag, since this function will likely require additional options in the future and this approach is more extensible.

return file_names[:2]


def test_spatial_join_reordering_can_be_disabled_e2e(geoarrow_data):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consideration is the test’s running time: the two Parquet files are around 10 KB, and this test runs in about 0.1 s on my machine.

I couldn’t find another way to perform the e2e test. The sd_random_geometry() table function does not seem to produce statistics, so spatial join reordering cannot occur.

Hope this is okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely ok! (Parquet tests use some of the larger files to ensure everything works with realistic input)

@2010YOUY01 2010YOUY01 changed the title feat(rust/sedona-spatial-join): Add config to disable join reordering feat(rust/sedona-spatial-join): Add config to disable spatial join reordering Mar 24, 2026
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

This is all good from my end, although I usually think of "join ordering" as involving multiple joins (this is more like choosing the indexed side of a single join). I am also not sure I have a better idea of what to call the option 🙂

Comment on lines +111 to +112
assert path_left.exists(), f"Missing test asset: {path_left}"
assert path_right.exists(), f"Missing test asset: {path_right}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have sedonadb.testing.skip_if_not_exists() for this (so the tests can run without the submodule)

return file_names[:2]


def test_spatial_join_reordering_can_be_disabled_e2e(geoarrow_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely ok! (Parquet tests use some of the larger files to ensure everything works with realistic input)

Comment on lines +72 to +77
def _plan_text(df):
query_plan = df.to_pandas()
return "\n".join(query_plan.iloc[:, 1].astype(str).tolist())


def _spatial_join_side_file_names(plan_text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #734 so that we can make hopefully make this easier some day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants