|
17 | 17 |
|
18 | 18 | #![deny(missing_docs)] |
19 | 19 |
|
20 | | -//! `datafusion-materialized-views` implements algorithms and functionality for materialized views in DataFusion. |
| 20 | +//! # datafusion-materialized-views |
| 21 | +//! |
| 22 | +//! `datafusion-materialized-views` provides robust algorithms and core functionality for working with materialized views in [DataFusion](https://arrow.apache.org/datafusion/). |
| 23 | +//! |
| 24 | +//! ## Key Features |
| 25 | +//! |
| 26 | +//! - **Incremental View Maintenance**: Efficiently tracks dependencies between Hive-partitioned tables and their materialized views, allowing users to determine which partitions need to be refreshed when source data changes. This is achieved via UDTFs such as `mv_dependencies` and `stale_files`. |
| 27 | +//! - **Query Rewriting**: Implements a view matching optimizer that rewrites queries to automatically leverage materialized views when beneficial, based on the techniques described in the [paper](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf). |
| 28 | +//! - **Pluggable Metadata Sources**: Supports custom metadata sources for incremental view maintenance, with default support for object store metadata via the `FileMetadata` and `RowMetadataRegistry` components. |
| 29 | +//! - **Extensible Table Abstractions**: Defines traits such as `ListingTableLike` and `Materialized` to abstract over Hive-partitioned tables and materialized views, enabling custom implementations and easy registration for use in the maintenance and rewriting logic. |
| 30 | +//! |
| 31 | +//! ## Typical Workflow |
| 32 | +//! |
| 33 | +//! 1. **Define and Register Views**: Implement a custom table type that implements the `Materialized` trait, and register it using `register_materialized`. |
| 34 | +//! 2. **Metadata Initialization**: Set up `FileMetadata` and `RowMetadataRegistry` to track file-level and row-level metadata. |
| 35 | +//! 3. **Dependency Tracking**: Use the `mv_dependencies` UDTF to generate build graphs for materialized views, and `stale_files` to identify partitions that require recomputation. |
| 36 | +//! 4. **Query Optimization**: Enable the query rewriting optimizer to transparently rewrite queries to use materialized views where possible. |
| 37 | +//! |
| 38 | +//! ## Example |
| 39 | +//! |
| 40 | +//! See the README and integration tests for a full walkthrough of setting up and maintaining a materialized view, including dependency tracking and query rewriting. |
| 41 | +//! |
| 42 | +//! ## Limitations |
| 43 | +//! |
| 44 | +//! - Currently supports only Hive-partitioned tables in object storage, with the smallest update unit being a file. |
| 45 | +//! - Future work may generalize to other storage backends and partitioning schemes. |
| 46 | +//! |
| 47 | +//! ## References |
| 48 | +//! |
| 49 | +//! - [Optimizing Queries Using Materialized Views: A Practical, Scalable Solution](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf) |
| 50 | +//! - [DataFusion documentation](https://datafusion.apache.org/) |
21 | 51 |
|
22 | 52 | /// Code for incremental view maintenance against Hive-partitioned tables. |
23 | 53 | /// |
|
0 commit comments