Skip to content

Conversation

@pwrliang
Copy link
Contributor

This PR adds GPU-accelerated spatial join support to SedonaDB using NVIDIA CUDA and the libgpuspatial library. GPU execution is automatically enabled when available and provides significant performance improvements for large-scale spatial joins.

Core Features

  • GPU Spatial Join Execution: Implemented GpuSpatialJoinExec physical plan that leverages CUDA for parallel spatial join operations
  • Auto-detection: GPU is automatically detected and enabled when building with --features gpu
  • Optimizer Integration: Spatial join optimizer automatically routes queries to GPU when enabled and hardware is available
  • CPU Fallback: Gracefully falls back to CPU execution when GPU is unavailable or encounters errors

Testing

  • Added SQL integration test test_gpu_spatial_join_sql with guaranteed-intersecting geometries
  • Test validates both ST_Intersects and ST_Contains predicates via SQL EXPLAIN and execution
  • Fixed optimizer schema validation to work correctly with GPU execution plans

Configuration

  • GPU can be enabled via: Build with --features gpu (auto-enables when hardware detected)
  # Disable GPU for entire session
  ctx.sql("SET sedona.spatial_join.gpu.enable = false")

  # Enable GPU for entire session
  ctx.sql("SET sedona.spatial_join.gpu.enable = true")

  # Check current setting
  result = ctx.sql("SHOW sedona.spatial_join.gpu.enable")
  result.show()

  # Method 4: Set other GPU options
  ctx.sql("SET sedona.spatial_join.gpu.min_rows_threshold = 100000")
  ctx.sql("SET sedona.spatial_join.gpu.device_id = 0")
  ctx.sql("SET sedona.spatial_join.gpu.fallback_to_cpu = true")

Testing

# Run GPU spatial join tests (requires CUDA-capable GPU)
cargo test --package sedona-spatial-join --features gpu test_gpu_spatial_join_sql -- --nocapture --ignored

# Build CLI with GPU support
cargo build --bin sedona-cli --features gpu --release

# Verify GPU execution via EXPLAIN
./target/release/sedona-cli -c "EXPLAIN SELECT * FROM polygons JOIN points ON ST_Intersects(polygons.geom, points.geom)"
# Should show: GpuSpatialJoinExec

@pwrliang pwrliang changed the title feat(c/sedona-libgpuspatial) Add GPU-accelerated spatial joins [WIP] feat(c/sedona-libgpuspatial) Add GPU-accelerated spatial joins Nov 15, 2025
@jiayuasu
Copy link
Member

@pwrliang now the CI is working

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! It has been very cool to watch this project evolve over the last six months and I know this represents a huge amount of work.

This is a large change and I wanted to leave a few high-level things to think about while you're polishing this up.

  • I see a bit of commented-out code in some of the files...feel free to file GitHub issues if that code represents a future piece of work that needs doing (and remove the commented-out code)!
  • I see CUDA-specific tests, which are great! If there are portions of the code that aren't well-covered by tests we do need to add them (we can open follow-on issues and do this in follow-up PRs too)
  • Because we're an Apache project we need the licensing and provenance of the files to be clear. I see some copyright notices from Nvidia...there's a place in LICENSE.md to acknowledge subdirectories where code was copied. We also need license headers on all the files (there's a script in scripts/ that can help do mass addition of the license header to a bunch of files at once).
  • It looks like you've done a great job ensuring the casual contributor doesn't have to deal with the GPU build complexity using default-members. That was one of my initial concerns but it looks great so far.

Give me a ping when you're ready for me to take a look!

@pwrliang
Copy link
Contributor Author

This is amazing! It has been very cool to watch this project evolve over the last six months and I know this represents a huge amount of work.

This is a large change and I wanted to leave a few high-level things to think about while you're polishing this up.

  • I see a bit of commented-out code in some of the files...feel free to file GitHub issues if that code represents a future piece of work that needs doing (and remove the commented-out code)!
  • I see CUDA-specific tests, which are great! If there are portions of the code that aren't well-covered by tests we do need to add them (we can open follow-on issues and do this in follow-up PRs too)
  • Because we're an Apache project we need the licensing and provenance of the files to be clear. I see some copyright notices from Nvidia...there's a place in LICENSE.md to acknowledge subdirectories where code was copied. We also need license headers on all the files (there's a script in scripts/ that can help do mass addition of the license header to a bunch of files at once).
  • It looks like you've done a great job ensuring the casual contributor doesn't have to deal with the GPU build complexity using default-members. That was one of my initial concerns but it looks great so far.

Give me a ping when you're ready for me to take a look!

@paleolimbot Hi, Dewey, thanks for your attention. Currently, this PR has resolved the license issues, and can pass almost all of the jobs in the CI. I'd like to hear any suggestions from you. @zhangfengcdt has written the Rust part to hook up sedona-db to libgpuspatial, so the credits for building go to him.

@zhangfengcdt
Copy link
Member

@pwrliang Thanks for opening this PR! As per our discussion, we could break down this large PR into smaller ones to make it manageable and reviewable. Ideally, (1)libgpuspatial (c++ and cuda) with tests (2) gpu spatial join module in rust (3) build pipelines and e2e tests.

Let me know if you can reduce this to only include (1) with the proper cleanup and apache license headers. We can continue (2) and (3) once this one merges. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants