Status: Work in Progress
This project aimes 'datalake table format' optimized for streaming data writes, built on Rust.
./build.shThis builds:
vine-core: Rust library for Vinevine-spark: Spark DataSource V2 connector
// Write streaming data
spark.readStream
.format("vine")
.load("input-path")
.writeStream
.format("vine")
.option("path", "/data/my-table")
.start()
// Read with Spark SQL
val df = spark.read.format("vine").load("/data/my-table")
df.show()┌─────────────────────────────────────┐
│ Query Engines (Spark, Trino) │
└──────────────┬──────────────────────┘
│ DataSource API
┌──────────────▼──────────────────────┐
│ Connectors (vine-spark/vine-trino) │
└──────────────┬──────────────────────┘
│ JNI
┌──────────────▼──────────────────────┐
│ Rust Core (vine-core) │
│ - Fast Parquet writes │
│ - Date-based partitioning │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ Storage (Parquet files) │
│ 2024-12-26/data_143025.parquet │
│ 2024-12-27/data_091500.parquet │
└─────────────────────────────────────┘
| Component | Language | Status | Purpose |
|---|---|---|---|
| vine-core | Rust | WIP | Write-optimized datalake table format |
| vine-spark | Scala | WIP | Spark DataSource V2 connector |
| vine-trino | Java | Planned | Trino connector (not started) |
- Files: Apache Parquet (columnar)
- Partitioning: Date-based directories (
YYYY-MM-DD/data_HHMMSS.parquet) - Metadata: JSON schema file (
vine_meta.json) - Types: integer, string, boolean, double
Rust Core
cd vine-core
cargo build --release
cargo testSpark Connector
cd vine-spark
sbt clean assembly- Rust 1.70+
- Scala 2.13, sbt 1.x
- Java 11