Skip to content

Commit f3d5eb1

Browse files
authored
Improve the doc (#95)
1 parent 3910e12 commit f3d5eb1

File tree

3 files changed

+35
-2
lines changed

3 files changed

+35
-2
lines changed

src/lib.rs

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,37 @@
1717

1818
#![deny(missing_docs)]
1919

20-
//! `datafusion-materialized-views` implements algorithms and functionality for materialized views in DataFusion.
20+
//! # datafusion-materialized-views
21+
//!
22+
//! `datafusion-materialized-views` provides robust algorithms and core functionality for working with materialized views in [DataFusion](https://arrow.apache.org/datafusion/).
23+
//!
24+
//! ## Key Features
25+
//!
26+
//! - **Incremental View Maintenance**: Efficiently tracks dependencies between Hive-partitioned tables and their materialized views, allowing users to determine which partitions need to be refreshed when source data changes. This is achieved via UDTFs such as `mv_dependencies` and `stale_files`.
27+
//! - **Query Rewriting**: Implements a view matching optimizer that rewrites queries to automatically leverage materialized views when beneficial, based on the techniques described in the [paper](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf).
28+
//! - **Pluggable Metadata Sources**: Supports custom metadata sources for incremental view maintenance, with default support for object store metadata via the `FileMetadata` and `RowMetadataRegistry` components.
29+
//! - **Extensible Table Abstractions**: Defines traits such as `ListingTableLike` and `Materialized` to abstract over Hive-partitioned tables and materialized views, enabling custom implementations and easy registration for use in the maintenance and rewriting logic.
30+
//!
31+
//! ## Typical Workflow
32+
//!
33+
//! 1. **Define and Register Views**: Implement a custom table type that implements the `Materialized` trait, and register it using `register_materialized`.
34+
//! 2. **Metadata Initialization**: Set up `FileMetadata` and `RowMetadataRegistry` to track file-level and row-level metadata.
35+
//! 3. **Dependency Tracking**: Use the `mv_dependencies` UDTF to generate build graphs for materialized views, and `stale_files` to identify partitions that require recomputation.
36+
//! 4. **Query Optimization**: Enable the query rewriting optimizer to transparently rewrite queries to use materialized views where possible.
37+
//!
38+
//! ## Example
39+
//!
40+
//! See the README and integration tests for a full walkthrough of setting up and maintaining a materialized view, including dependency tracking and query rewriting.
41+
//!
42+
//! ## Limitations
43+
//!
44+
//! - Currently supports only Hive-partitioned tables in object storage, with the smallest update unit being a file.
45+
//! - Future work may generalize to other storage backends and partitioning schemes.
46+
//!
47+
//! ## References
48+
//!
49+
//! - [Optimizing Queries Using Materialized Views: A Practical, Scalable Solution](https://dsg.uwaterloo.ca/seminars/notes/larson-paper.pdf)
50+
//! - [DataFusion documentation](https://datafusion.apache.org/)
2151
2252
/// Code for incremental view maintenance against Hive-partitioned tables.
2353
///

src/materialized/dependencies.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@ use crate::materialized::META_COLUMN;
6262

6363
use super::{cast_to_materialized, row_metadata::RowMetadataRegistry, util, Materialized};
6464

65-
/// A table function that shows build targets and dependencies for a materialized view:
65+
/// A table function that, for a given materialized view, lists all the output data objects (build targets)
66+
/// generated during its construction or refresh, as well as all the source data objects (dependencies) it relies on.
6667
///
6768
/// ```ignore
6869
/// fn mv_dependencies(table_ref: Utf8) -> Table

src/materialized/util.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ use datafusion::catalog::{CatalogProviderList, TableProvider};
2121
use datafusion_common::{DataFusionError, Result};
2222
use datafusion_sql::ResolvedTableReference;
2323

24+
/// Retrieves a table from the catalog list given a resolved table reference.
2425
pub fn get_table(
2526
catalog_list: &dyn CatalogProviderList,
2627
table_ref: &ResolvedTableReference,
@@ -35,6 +36,7 @@ pub fn get_table(
3536

3637
// NOTE: this is bad, we are calling async code in a sync context.
3738
// We should file an issue about async in UDTFs.
39+
// See: https://github.com/apache/datafusion/issues/17663
3840
futures::executor::block_on(schema.table(table_ref.table.as_ref()))
3941
.map_err(|e| e.context(format!("couldn't get table '{}'", table_ref.table)))?
4042
.ok_or_else(|| DataFusionError::Plan(format!("no such table {}", table_ref.schema)))

0 commit comments

Comments
 (0)