BtrBlocks is an Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
This library provides an interface to the BtrBlocks C++ library with ergonomic Rust types. It does that by creating bindings to the custom C++ wrapper around the original BtrBlocks library.
- Column compression and decompression with the supported data types (integer, float64, and string).
- DataFusion TableProvider integration
- Compression and decompression into CSV format
- Implements decompression streams
ChunkedDecompressionStreamandCsvDecompressionStreamfor on-demand decompression - Interoperability with different object stores (local filesystem, Amazon s3, Google Cloud Storage, Azure Blob Storage, and HTTP file servers)
- Mount remote BtrBlocks compressed files as a local file, offering on-demand decompression per read request
For library documentation, use cargo;
cargo doc --openTo compile the Rust library, the C++ project should be compiled to generate the bindings. This process is automated in the build.rs file, however, you need to ensure the dependencies are present in your shell for compiling the C++ BtrBlocks project. If you are using nix, you can just use the flake.nix in the project to drop yourself into a development shell;
nix develop .Then, to compile the project, use cargo;
cargo build --releaseAfter compiling the Rust library, you can run the unit tests using cargo;
cargo tThe project also has a CLI program that uses the btrblocks_rs rust library under the hood and offers multiple features for interacting with the BtrBlocks format.
from-csv: Compress a CSV file into btr formatto-csv: Decompress a btr file into CSV formatmount-csv: Mount a new file system with fuse and expose the decompressed csv file thereprint-csv: Decompress the btr compressed file into csv and print the result to stdoutquery: Run an SQL query on the given btr compressed file
Make sure you have the dependencies available to compile the C++ BtrBlocks library, and use cargo to compile the CLI;
cargo build --features="cli" --no-default-features --release