GPU-first embedded analytics database with graceful degradation: GPU → SIMD → Scalar
- Cost-based dispatch: GPU only when compute > 5x transfer time
- Morsel-based paging: Out-of-core execution (128MB chunks)
- JIT WGSL compiler: Kernel fusion for single-pass execution
- GPU kernels: SUM, MIN, MAX, COUNT, AVG, fused filter+sum
- SIMD fallback: Trueno integration (AVX-512/AVX2/SSE2)
- SQL interface: SELECT, WHERE, aggregations, ORDER BY, LIMIT
[dependencies]
trueno-db = "0.3"
# Optional: GPU acceleration
trueno-db = { version = "0.3", features = ["gpu"] }use trueno_db::query::{QueryEngine, QueryExecutor};
use trueno_db::storage::StorageEngine;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let storage = StorageEngine::load_parquet("data/events.parquet")?;
let engine = QueryEngine::new();
let executor = QueryExecutor::new();
let plan = engine.parse(
"SELECT COUNT(*), SUM(value), AVG(value) FROM events WHERE value > 100"
)?;
let result = executor.execute(&plan, &storage)?;
Ok(())
}SIMD Aggregation (1M rows, AMD Threadripper 7960X):
| Operation | SIMD | Scalar | Speedup |
|---|---|---|---|
| SUM | 228µs | 634µs | 2.78x |
| MIN | 228µs | 1,048µs | 4.60x |
| MAX | 228µs | 257µs | 1.13x |
| AVG | 228µs | 634µs | 2.78x |
┌─────────────────────────────────────────────┐
│ SQL Interface │
│ (QueryEngine / Parser) │
├─────────────────────────────────────────────┤
│ Query Executor │
│ (cost-based dispatch, morsel paging) │
├──────────┬──────────┬───────────────────────┤
│ GPU │ SIMD │ Scalar │
│ (WGSL) │ (Trueno) │ (fallback) │
├──────────┴──────────┴───────────────────────┤
│ Storage Engine │
│ (columnar, Parquet, morsel-based I/O) │
└─────────────────────────────────────────────┘
- Query Layer: SQL parser produces logical plans, cost-based optimizer selects GPU/SIMD/Scalar backend
- Execution Layer: Morsel-based paging (128MB chunks) enables out-of-core processing
- GPU Backend: JIT-compiled WGSL kernels with fused filter+aggregate passes
- SIMD Backend: Trueno-powered AVX-512/AVX2/SSE2 vectorized aggregations
- Storage Layer: Columnar storage with Parquet I/O and late materialization
Load and manage columnar data:
let storage = StorageEngine::load_parquet("data.parquet")?;
let storage = StorageEngine::from_batches(batches);Parse and execute SQL queries:
let engine = QueryEngine::new();
let plan = engine.parse("SELECT SUM(value) FROM data WHERE id > 10")?;
let result = QueryExecutor::new().execute(&plan, &storage)?;Direct GPU kernel access (requires gpu feature):
let gpu = GpuEngine::new().await?;
let sum = gpu.sum_i32(&data).await?;
let filtered = gpu.fused_filter_sum(&data, threshold, "gt").await?;Heap-based top-K selection for ORDER BY ... LIMIT queries:
let top = top_k_descending(&batch, column_idx, k)?;cargo run --example sql_query_interface --release
cargo run --example benchmark_shootout --release
cargo run --example gaming_leaderboards --release
cargo run --example gpu_aggregations --features gpu --releasecargo test --lib # Unit tests (66 tests)
cargo test # All tests including integration
cargo test --doc # Documentation tests
make quality-gate # Full quality gate: lint + test + coverageProperty-based tests cover storage invariants and top-K correctness via proptest.
make build # Build
make test # Run tests
make quality-gate # lint + test + coverage
make bench-comparisonContributions are welcome! Please see the CONTRIBUTING.md guide for details.
Minimum Supported Rust Version: 1.75
MIT