Skip to content
Open
Show file tree
Hide file tree
Changes from 100 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
39bcb05
feat: Add GPU-accelerated spatial joins based on libgpuspatial C++/CUDA
zhangfengcdt Sep 30, 2025
02a7623
Merge branch 'main' of github.com:zhangfengcdt/sedona-db into feature…
zhangfengcdt Sep 30, 2025
b4cddec
fix fmt error
zhangfengcdt Sep 30, 2025
b280b83
Merge branch 'main' of github.com:zhangfengcdt/sedona-db into feature…
zhangfengcdt Oct 1, 2025
46de0ca
Add gpu config, execution structure, and feature flag
zhangfengcdt Oct 1, 2025
671cfba
restructure
zhangfengcdt Oct 1, 2025
185ad5e
implement stream
zhangfengcdt Oct 1, 2025
d427683
refactor to use simplied approach
zhangfengcdt Oct 1, 2025
6f8d3c0
implement tests
zhangfengcdt Oct 1, 2025
966a896
add tests and ci pipeline
zhangfengcdt Oct 1, 2025
191bd85
disable running gpu tests
zhangfengcdt Oct 1, 2025
a2ef437
add libgpuspatial source
zhangfengcdt Oct 1, 2025
fbf2238
test rust build
zhangfengcdt Oct 1, 2025
f8234e6
temporarily disable other workflows
zhangfengcdt Oct 1, 2025
6afd6ef
wip
zhangfengcdt Oct 1, 2025
205bbd5
fix rust-gpu build ci pipeline
zhangfengcdt Oct 1, 2025
50bf5aa
fix ci build
zhangfengcdt Oct 1, 2025
50b4885
fix cmake
zhangfengcdt Oct 1, 2025
167571e
fix ubuntu version
zhangfengcdt Oct 1, 2025
e464812
install from vcpkg.json
zhangfengcdt Oct 1, 2025
94bb0f0
fix vcpkg not found issue
zhangfengcdt Oct 1, 2025
91beb9b
add vkpkg.json
zhangfengcdt Oct 1, 2025
5f5a8bd
install gcc 10
zhangfengcdt Oct 1, 2025
91457d9
use gcc for cuda compatibility
zhangfengcdt Oct 2, 2025
869318e
gcc 10
zhangfengcdt Oct 2, 2025
58af42e
fix build.rs
zhangfengcdt Oct 2, 2025
4b219f5
force use gcc 10
zhangfengcdt Oct 2, 2025
2694d1d
use cuda 12.4 with gcc 11
zhangfengcdt Oct 2, 2025
7928564
add cuda repository before install cuda
zhangfengcdt Oct 2, 2025
939b5f6
cleanup disk space
zhangfengcdt Oct 2, 2025
f8e9270
add libclang-dev
zhangfengcdt Oct 2, 2025
be48742
Enabled ffi feature for arrow-array dependency
zhangfengcdt Oct 2, 2025
18cc4e0
update cuda lib path for linker
zhangfengcdt Oct 2, 2025
35c816f
skip building cuda driver tests
zhangfengcdt Oct 2, 2025
64bd566
fix unsafe impl Send for GpuSpatialJoinerContext to mark it as thread…
zhangfengcdt Oct 2, 2025
d0a4a1b
fix one more Send issue
zhangfengcdt Oct 2, 2025
d7ea4c3
add entire build with gpu
zhangfengcdt Oct 2, 2025
fbeffe4
exclude sedona-s2geography build
zhangfengcdt Oct 2, 2025
9acff8d
add debug info to gpu calls
zhangfengcdt Oct 3, 2025
3e4238e
fix compile
zhangfengcdt Oct 3, 2025
6215c4a
Fix gpu tests on aws ec2 instance with gpu hardware (#2)
zhangfengcdt Oct 4, 2025
790c840
Fix the schema not matching when projections are presented issue (#3)
zhangfengcdt Oct 7, 2025
d7def87
Merge branch 'main' of github.com:zhangfengcdt/sedona-db into feature…
zhangfengcdt Oct 8, 2025
36fdfca
revert cargo lock file
zhangfengcdt Oct 8, 2025
a936507
Merge pull request #5 from zhangfengcdt/feature/gpu-spatial-join-trym…
zhangfengcdt Oct 8, 2025
52bbb2c
Bugfix
pwrliang Oct 9, 2025
90339ca
Enable all tests
pwrliang Oct 9, 2025
e7ffabd
Merge pull request #6 from pwrliang/fix/gpu-spatial-join
zhangfengcdt Oct 9, 2025
192ddda
Fix compiling issue
pwrliang Oct 9, 2025
06c5694
Merge pull request #7 from pwrliang/fix/gpu-spatial-join
zhangfengcdt Oct 9, 2025
e1f8375
fix the single partition join issue
Oct 14, 2025
5aa3c1a
remove unused debug info
Oct 14, 2025
9adc6fc
wip - fix single partition output with limit clause
Oct 16, 2025
8398517
switch to use common spatial index approach
Oct 16, 2025
2084d99
some optimizations
Oct 16, 2025
2705632
Making the benchmark program to read local files
pwrliang Oct 17, 2025
984a7ad
Optimize PIP
pwrliang Oct 17, 2025
e95a1aa
enable commented out code
pwrliang Oct 17, 2025
c22900c
Implemented PIP with ray tracing, debugging
pwrliang Oct 22, 2025
c106cf2
Simple polygon passed tests
pwrliang Oct 22, 2025
6b3d212
add debug information to diagnose performance issue
Oct 23, 2025
f1fdc0b
delete source md
Oct 27, 2025
01f6b64
Correctness check is passed, need cleanup and optimizations
pwrliang Oct 29, 2025
46282c6
Cleanup finished
pwrliang Oct 31, 2025
a1c3316
Optimize BVH construction
pwrliang Oct 31, 2025
328df19
Finished Polygon-point query
pwrliang Oct 31, 2025
f95e96e
Remove unused code
pwrliang Oct 31, 2025
17cddd7
Performance debugging
pwrliang Oct 31, 2025
44e18ab
don't use encoder
pwrliang Nov 1, 2025
afedb3c
enable disabled code
pwrliang Nov 1, 2025
6116e49
Print more information
pwrliang Nov 1, 2025
2e2d5ee
Optimize BVH building
pwrliang Nov 1, 2025
369e64e
Batch execution
pwrliang Nov 2, 2025
1e9607f
Merge pull request #8 from pwrliang/gpu-optimize
pwrliang Nov 2, 2025
0642f19
fix gpu test fail
Nov 3, 2025
9bb5da1
A better fix for PIP
pwrliang Nov 3, 2025
63e1b67
Reduce memory usage
pwrliang Nov 4, 2025
d6b0a01
WIP: A parallel WKB loader
pwrliang Nov 10, 2025
850447b
Loader basically is working
pwrliang Nov 12, 2025
fc5ef6f
Finish test cases
pwrliang Nov 12, 2025
ab87507
Cleanup spatial joiner
pwrliang Nov 12, 2025
6dc0672
Bug fixes
pwrliang Nov 13, 2025
f9beaf4
Introduce thread pool
pwrliang Nov 13, 2025
75a44dd
Bug fix and overloading for multipoint - (multi-)polygon
pwrliang Nov 13, 2025
45e471b
Use RAPIDS's cmake
pwrliang Nov 14, 2025
2d5f7ce
generating config with rapids_export
pwrliang Nov 14, 2025
8b3c5cf
Fix linking
pwrliang Nov 14, 2025
8da41ff
Reorg files
pwrliang Nov 14, 2025
db0deeb
Bugfix and better logging
pwrliang Nov 14, 2025
930a7ac
Disable some Rust logs
pwrliang Nov 14, 2025
ce75dcf
Log WKT parsing time
pwrliang Nov 14, 2025
0376b87
Use pragma once and add licenses
pwrliang Nov 14, 2025
a376e87
Fix include order
pwrliang Nov 14, 2025
8548017
Bugfix
pwrliang Nov 14, 2025
1314494
Remove benchmark program
pwrliang Nov 15, 2025
71226e4
Change log-level
pwrliang Nov 16, 2025
7a0e1ae
Merge upstream code
pwrliang Nov 16, 2025
bb3e784
Fix tests
pwrliang Nov 16, 2025
49d618e
Restore yml changes
pwrliang Nov 16, 2025
843d981
Calculate chunks according to free memory.
pwrliang Nov 17, 2025
2343e43
Debugging CI
pwrliang Nov 17, 2025
8f9d4f8
Merge branch 'main' of https://github.com/apache/sedona-db into gpu
pwrliang Nov 17, 2025
a27de08
Merge branch 'gpu' of github.com:pwrliang/sedona-db into gpu
pwrliang Nov 17, 2025
bd44d4b
Add licenses
pwrliang Nov 18, 2025
a12379f
Fix some license issues
pwrliang Nov 18, 2025
5d820ef
Fixes with pre-commit
pwrliang Nov 18, 2025
25320a0
Fix some lints
pwrliang Nov 18, 2025
3d0ea67
Fix some lints
pwrliang Nov 18, 2025
9f5d685
Remove commented out code
pwrliang Nov 18, 2025
392b22b
Remove commented out code
pwrliang Nov 19, 2025
58fcba3
Rewrite some code
pwrliang Nov 19, 2025
f147aba
Fix license issues
pwrliang Nov 19, 2025
42ebc68
Try to fix issues reported by clippy
pwrliang Nov 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions .github/workflows/rust-gpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This workflow compiles CUDA code on GitHub-hosted runners (ubuntu-latest).
# CUDA compilation (nvcc) works WITHOUT GPU hardware - only needs CUDA toolkit.
# GPU runtime execution requires actual GPU, so tests are commented out.
#
name: rust-gpu

on:
pull_request:
branches:
- main
paths:
- 'c/sedona-libgpuspatial/**'
- 'rust/sedona-spatial-join-gpu/**'
- '.github/workflows/rust-gpu.yml'

push:
branches:
- main
paths:
- 'c/sedona-libgpuspatial/**'
- 'rust/sedona-spatial-join-gpu/**'
- '.github/workflows/rust-gpu.yml'

concurrency:
group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}-rust-gpu
cancel-in-progress: true

permissions:
contents: read

defaults:
run:
shell: bash -l -eo pipefail {0}

# Set workflow timeout to 90 minutes for CUDA compilation
# Expected: ~45-60 minutes first time, ~10-15 minutes cached
env:
WORKFLOW_TIMEOUT_MINUTES: 90
# At GEOS updated to 3.14.0
VCPKG_REF: 5a01de756c28279ddfdd2b061d1c75710a6255fa

jobs:
rust-gpu-build:
# Using GitHub-hosted runner to compile CUDA code
# CUDA compilation works without GPU hardware (only needs CUDA toolkit)
# GPU tests are skipped (no GPU hardware for runtime execution)
# TODO: Once GPU runner is ready, enable GPU tests with:
# runs-on: [self-hosted, gpu, linux, cuda]
strategy:
fail-fast: false
matrix:
name: [ "clippy", "docs", "test", "build" ]

name: "${{ matrix.name }}"
runs-on: ubuntu-latest
timeout-minutes: 60
env:
CARGO_INCREMENTAL: 0
# Disable debug info completely to save disk space
CARGO_PROFILE_DEV_DEBUG: 0
CARGO_PROFILE_TEST_DEBUG: 0
# Limit parallel compilation to reduce memory pressure (GPU compilation is intensive)
CARGO_BUILD_JOBS: 4


steps:
- uses: actions/checkout@v4
with:
submodules: 'recursive'

- name: Clone vcpkg
uses: actions/checkout@v4
with:
repository: microsoft/vcpkg
ref: ${{ env.VCPKG_REF }}
path: vcpkg

# Set up environment variables for vcpkg and CUDA
- name: Set up environment variables and bootstrap vcpkg
env:
VCPKG_ROOT: ${{ github.workspace }}/vcpkg
CMAKE_TOOLCHAIN_FILE: ${{ github.workspace }}/vcpkg/scripts/buildsystems/vcpkg.cmake
# CUDA configuration (CUDA 12.4 installs to /usr/local/cuda-12.4)
CUDA_HOME: /usr/local/cuda-12.4
run: |
cd vcpkg
./bootstrap-vcpkg.sh
cd ..

echo "VCPKG_ROOT=$VCPKG_ROOT" >> $GITHUB_ENV
echo "PATH=$VCPKG_ROOT:$PATH" >> $GITHUB_ENV
echo "CMAKE_TOOLCHAIN_FILE=$CMAKE_TOOLCHAIN_FILE" >> $GITHUB_ENV
echo "/usr/local/cuda/bin" >> $GITHUB_PATH

# Free up disk space before build
- name: Free disk space
run: |
# Remove unnecessary packages to free up ~10GB
sudo apt-get remove -y '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*' '^mysql-.*' azure-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri || true
sudo apt-get autoremove -y
sudo apt-get clean

# Remove large directories
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true

# Install system dependencies including CUDA toolkit for compilation
- name: Install system dependencies
run: |
sudo apt-get update

# Install transport tools for Kitware CMake (needed for newer CMake)
sudo apt-get install -y apt-transport-https ca-certificates gnupg software-properties-common wget

# Add Kitware repository for CMake
wget -qO - https://apt.kitware.com/keys/kitware-archive-latest.asc | sudo apt-key add -
sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ jammy main'
sudo apt-get update

# Install build tools
sudo apt-get install -y build-essential pkg-config cmake flex bison

# Install libclang for bindgen (Rust FFI binding generator)
sudo apt-get install -y libclang-dev

# Verify compiler and CMake versions
gcc --version
g++ --version
cmake --version

# Install GEOS for spatial operations
sudo apt-get install -y libgeos-dev libzstd-dev

# Install CUDA toolkit for compilation (nvcc)
# Note: CUDA compilation works without GPU hardware
# GPU runtime tests still require actual GPU
# CUDA 12.4 supports GCC 11 (default on Ubuntu 22.04)
if ! command -v nvcc &> /dev/null; then
echo "Installing CUDA 12.4 toolkit for compilation..."

# Add NVIDIA CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# Remove any existing CUDA toolkit
sudo apt purge cuda-toolkit* -y || true

# Install CUDA 12.4 specifically
sudo apt-get install -y cuda-toolkit-12-4

# Set CUDA path
echo "/usr/local/cuda-12.4/bin" >> $GITHUB_PATH

nvcc --version
else
echo "CUDA toolkit already installed: $(nvcc --version)"
fi

# Cache vcpkg installed packages (expensive to rebuild)
- name: Cache vcpkg binaries
id: cache-vcpkg
uses: actions/cache@v4
with:
path: vcpkg/packages
# Bump the number at the end of this line to force a new dependency build
key: vcpkg-installed-${{ runner.os }}-${{ runner.arch }}-${{ env.VCPKG_REF }}-2

# Install vcpkg dependencies from vcpkg.json manifest
- name: Install vcpkg dependencies
if: steps.cache-vcpkg.outputs.cache-hit != 'true'
run: |
./vcpkg/vcpkg install abseil openssl gtest
# Clean up vcpkg buildtrees and downloads to save space
rm -rf vcpkg/buildtrees
rm -rf vcpkg/downloads

- name: Use stable Rust
id: rust
run: |
rustup toolchain install stable --no-self-update
rustup default stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: "rust-gpu-v3"
# Cache key includes GPU packages and vcpkg config
key: "${{ runner.os }}-${{ hashFiles('c/sedona-libgpuspatial/**', 'vcpkg.json') }}"

# Build WITH GPU feature to compile CUDA code
# CUDA compilation (nvcc) works without GPU hardware
# Only GPU runtime execution requires actual GPU
- name: Build libgpuspatial (with CUDA compilation)
run: |
echo "=== Building libgpuspatial WITH GPU feature ==="
echo "Compiling CUDA code using nvcc (no GPU hardware needed for compilation)"
echo "Note: First build with CUDA takes 45-60 minutes (CMake + CUDA compilation)"
echo "Subsequent builds: 10-15 minutes (cached)"
echo ""
echo "Build started at: $(date)"
# Build library only (skip tests - they require CUDA driver which isn't available)
# --lib builds only the library, not test binaries
cargo build --locked --package sedona-libgpuspatial --lib --features gpu --verbose

- name: Build GPU spatial join (with GPU feature)
run: |
echo "=== Building GPU spatial join package WITH GPU feature ==="
echo "Building Rust GPU spatial join (depends on libgpuspatial)"
echo ""
# Build library only (skip tests - they require CUDA driver)
cargo build --locked --package sedona-spatial-join-gpu --lib --features gpu --verbose

- name: Build entire workspace with GPU features
run: |
echo "=== Building entire SedonaDB workspace WITH GPU features ==="
echo "Verifying GPU packages integrate with rest of codebase"
echo ""
# Build entire workspace with GPU features enabled
# Exclude sedonadb (Python extension, requires maturin)
# Exclude sedona-s2geography (has GCC 11 compatibility issues, unrelated to GPU)
# Build libs only (skip tests - they require CUDA driver)
cargo build --workspace --exclude sedonadb --exclude sedona-s2geography --lib --features gpu --verbose

# GPU tests commented out - no GPU hardware on GitHub runners
# Uncomment these when running on self-hosted GPU runner

# - name: Test libgpuspatial
# run: |
# echo "Running libgpuspatial tests with GPU..."
# cargo test --package sedona-libgpuspatial --features gpu -- --nocapture

# - name: Test GPU spatial join (structure tests)
# run: |
# echo "Running structure tests (don't require GPU execution)..."
# cargo test --package sedona-spatial-join-gpu --features gpu

# - name: Test GPU functional tests (require GPU)
# run: |
# echo "Running GPU functional tests (require actual GPU)..."
# cargo test --package sedona-spatial-join-gpu --features gpu -- --ignored --nocapture
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,7 @@ __pycache__

# .env file for release management
dev/release/.env


venv/

Loading
Loading