diff --git a/README.md b/README.md index 52707ba..29f20a2 100644 --- a/README.md +++ b/README.md @@ -5,12 +5,152 @@ -# Rivet DB - > _Query everything — **instantly**._ -**RivetDB** lets you run Trino-style federated queries over a smart, queryable cache. -Remote data is fetched just-in-time and cached for ultra-fast reuse. +# RivetDB + +Query everything — instantly. + +RivetDB is a high-performance, federated query engine built on a smart, just-in-time cache. +It provides a Trino-style SQL interface for querying remote data sources without needing heavy infrastructure. +The goal is to unify data access across systems while still getting fast, reliable performance on a single node. + +RivetDB takes insperation from the [DuckDB](https://duckdb.org/) and SmallData community, which has shown how much you can get out of a single machine when you pair simplicity, vectorized execution, and efficient columnar formats. RivetDB applies that same thinking to federated data, using Arrow-native execution to make remote sources feel local. We think this is going to be especially important as agents query disparate systems and want to avoid scattering data fetching logic. + +Under the hood, RivetDB is built in Rust and powered by [Apache DataFusion](https://datafusion.apache.org/). The combination gives us strong safety guarantees, solid performance, and a proven execution engine that plays well with Arrow. + +RivetDB is under active development, and the APIs will continue to shift as we move toward a stable 1.0 release. +ment, and the APIs will continue to shift as we move toward a stable 1.0 release. + + +> **🚧 Early Development:** We are targeting a public preview with caching, lookup APIs, and a CLI in **Q1 2026**. +> Expect breaking changes until the API surface stabilizes. + +--- + +## Getting Started + +RivetDB is evolving quickly and installation instructions will change. +To experiment with the current engine: + +```bash +docker pull ghcr.io/hotdata-dev/rivetdb:latest +docker run -p 3000:3000 ghcr.io/hotdata-dev/rivetdb:latest +``` + +Alternatively/additionally, you can build and run locally: +```bash + cargo run --bin server config-local.toml +``` + +--- + + +## What RivetDB does Today + +- Just-in-time retrieval of remote data +- Federated SQL interface inspired by Trino +- Rust-powered engine focused on performance and correctness +- Basic caching and early internal APIs +- Initial examples and tests +- **Current adapter support:** + - **Postgres** + - **DuckDB** + - **MotherDuck** + +This foundation supports the larger roadmap described below. + +--- + +## Vision + +RivetDB aims to become a unified query engine that eliminates challenges working with between disparate data systems. The project emphasizes: + +- A consistent SQL interface for structured, semi-structured, and remote data sources +- Intelligent caching that adapts to query patterns and reduces data movement +- Tight integration with modern data platforms +- Tooling that gives developers fast introspection into data, metadata, and performance +- Clear APIs and predictable behavior + +The long-term trajectory includes distributed caching, richer connectors, real-time introspection, and seamless orchestration integration. + +--- ## Roadmap -Coming soon! We're planning to release the first version in early January. \ No newline at end of file + +Roadmap items below correspond directly to open issues in the repository. +For the latest view, consult: https://github.com/hotdata-dev/rivetdb/issues + +### **📍 Version 0.1 — Core Feature Set (Current Development)** + +These features define the first stable preview of RivetDB: + +- **Parquet Metadata Cache** + Cache Parquet metadata to optimize planning and repeated reads. + +- **Result Lookup API** + Provide structured access to intermediate query results for debugging and tooling. + +- **Query Metadata API** + Expose information about query execution: planning, caching, durations, and more. + +- **Optional Session IDs** + Support optional sessions for multi-query workflows. + +- **Table Caching Support** + Enable caching of entire remote tables for repeated or incremental queries. + +- **Arrow Flight SQL Support** + Add high-performance transport for large result sets and client libraries. + +- **RivetDB CLI** + Build a command-line interface for running queries, inspecting cache state, and interacting with the engine. + +These are high-priority items and represent the minimum surface for an early release. + +--- + +### **🚀 Future Themes (Post-0.1)** + +These represent emerging priorities after the first public preview: + +- **Additional Connectors** + Planned support for major databases and warehouses, including: + - MySQL / MariaDB + - SQLite + - BigQuery + - Snowflake + - Redshift + - Databricks / Unity Catalog + - ClickHouse + - Athena + - Synapse / Fabric + - Generic JDBC + - REST and streaming sources + - Cloud storage systems (S3, GCS, Azure Blob) + +- **Distributed Caching & Invalidation** + Peer-aware caching and consistency guarantees across nodes. + +- **Observability & Telemetry** + Metrics, tracing, execution timelines, and debugger-friendly diagnostics. + +- **Developer Tooling & IDE Integrations** + Schema discovery, query explain tools, and interactive exploration. + +--- + +## Feature Status (At-a-Glance) + +| Feature | Status | +|--------|--------| +| Parquet Metadata Cache | Planned | +| Result Lookup API | Planned | +| Query Metadata API | Planned | +| Table Caching | Planned | +| Arrow Flight SQL | Planned | +| RivetDB CLI | Planned | +| Cache | Alpha | +| Current Connectors: Postgres, DuckDB, MotherDuck | Alpha | +| Observability | Backlog | +| Additional Connectors | Backlog |