feat(query): Runtime Filter support spatial index join#19530
feat(query): Runtime Filter support spatial index join#19530b41sh merged 4 commits intodatabendlabs:mainfrom
Conversation
77a6dc2 to
30db491
Compare
cf6c29b to
d3076c0
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds runtime filter support for spatial index joins, enabling block-level pruning for probe-side tables with spatial indexes when using predicates like st_within, st_intersects, st_contains, and st_equals. It also simplifies the spatial index format by removing redundant srid and invalid_rows columns.
Changes:
- Adds spatial runtime filter infrastructure: build-side constructs an RTree, transmits it through the runtime filter framework, and probe-side uses it for two-level pruning (stats-based coarse + RTree fine)
- Removes
sridandinvalid_rowscolumns from spatial index, deriving SRID fromSpatialStatisticsand addinghas_empty_rect/is_validfields - Extends
FuseBlockPartInfoto carry spatial index location and spatial statistics for runtime pruning
Reviewed changes
Copilot reviewed 58 out of 59 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/query/catalog/src/runtime_filter_info.rs |
Adds RuntimeFilterSpatial struct and spatial stats tracking |
src/query/functions/src/lib.rs |
Adds GENERAL_SPATIAL_FUNCTIONS constant |
src/query/settings/src/settings_default.rs |
Adds spatial_runtime_filter_threshold setting |
src/query/service/src/physical_plans/physical_hash_join.rs |
Extracts spatial join conditions from non-equi filters for runtime filter building |
src/query/service/src/physical_plans/runtime_filter/builder.rs |
Marks spatial filters with is_spatial flag |
src/query/service/src/physical_plans/runtime_filter/types.rs |
Adds is_spatial field to PhysicalRuntimeFilter |
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/spatial.rs |
New: RTree building, compaction, merging, and bounds extraction |
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/local_builder.rs |
Adds spatial bbox extraction during build-side processing |
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/convert.rs |
Converts spatial packets to RuntimeFilterSpatial entries |
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/merge.rs |
Merges spatial filter packets across partitions |
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/packet.rs |
Adds SpatialPacket to runtime filter packet |
src/query/storages/fuse/src/pruning/spatial_runtime_pruner.rs |
New: Two-level spatial pruning (stats + RTree index) |
src/query/storages/fuse/src/pruning/spatial_index_pruner.rs |
Refactored to use SpatialIndexReader and SpatialStatistics |
src/query/storages/fuse/src/io/read/spatial_index/spatial_index_reader.rs |
New: Shared spatial index file reader |
src/query/storages/fuse/src/io/write/spatial_index_writer.rs |
Removes srid/invalid_rows columns, only stores RTree with valid rects |
src/query/storages/fuse/src/fuse_part.rs |
Adds spatial index location and spatial stats to FuseBlockPartInfo |
src/query/storages/fuse/src/operations/read_partitions.rs |
Propagates spatial stats and index location to part info |
src/query/storages/fuse/src/operations/read/parquet_data_transform_reader.rs |
Integrates spatial runtime pruner into read pipeline |
src/query/storages/fuse/src/operations/read/native_data_transform_reader.rs |
Integrates spatial runtime pruner into native read pipeline |
src/query/storages/fuse/src/statistics/spatial_stats.rs |
Adds has_empty_rect, is_valid fields; changes finalize to always return stats |
src/query/storages/common/table_meta/src/meta/v2/statistics.rs |
Adds has_empty_rect and is_valid to SpatialStatistics |
src/query/storages/common/index/src/range_index.rs |
Handles Option<Rect> for empty geometries |
src/query/storages/common/index/src/spatial_predicate.rs |
Uses GENERAL_SPATIAL_FUNCTIONS constant, handles empty rects |
| Various join implementation files | Thread spatial_threshold through join types |
| Test files | Adds spatial join tests and runtime filter unit tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/query/service/src/pipelines/processors/transforms/hash_join/runtime_filter/convert.rs
Outdated
Show resolved
Hide resolved
tests/sqllogictests/suites/query/index/10_spatial_index/10_0000_spatial_index_base.test
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3076c0fac
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This PR targets spatial join predicates where the probe side can benefit from block pruning, such as:
JOIN ... ON st_within(t1.geom, t2.geom)JOIN ... ON st_intersects(t1.geom, t2.geom)JOIN ... ON st_contains(t1.geom, t2.geom)JOIN ... ON st_equals(t1.geom, t2.geom)It is especially useful when the build side is smaller and the probe side is a large spatial table with a spatial index.
Implementation Overview
RTreefrom build-side geometries and transmit it through the existing runtime filter framework.spatial_runtime_filter_threshold. If the number ofRTreeitems exceeds the threshold, we compact the tree to reduce its size before sending.SpatialStatistics(block-level bbox + SRID).RTreeintersection checks).Other Changes
sridandinvalid_rowscolumns:SRIDcan be derived fromSpatialStatistics, so the extra column is redundant.invalid_rowswas only used for row-level checks, which are no longer required in the block-level pruning flow.FuseBlockPartInfoto carry spatial index location and spatial statistics for runtime pruningfor example
Tests
Type of change
This change is