Skip to content

Commit bc469c3

Browse files
authored
feat(datafusion): Implement IcebergWriteExec for DataFusion write support (#1585)
## Which issue does this PR close? - Closes #1545 - See the original draft PR: #1511 ## What changes are included in this PR? - Added `IcebergWriteExec` to write the input execution plan to parquet files, and returns serialized data files ## Are these changes tested? added ut
1 parent 5805af5 commit bc469c3

File tree

5 files changed

+629
-0
lines changed

5 files changed

+629
-0
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/iceberg/src/spec/table_metadata.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,18 @@ pub const PROPERTY_COMMIT_TOTAL_RETRY_TIME_MS: &str = "commit.retry.total-timeou
119119
/// Default value for total maximum retry time (ms).
120120
pub const PROPERTY_COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT: u64 = 30 * 60 * 1000; // 30 minutes
121121

122+
/// Default file format for data files
123+
pub const PROPERTY_DEFAULT_FILE_FORMAT: &str = "write.format.default";
124+
/// Default file format for delete files
125+
pub const PROPERTY_DELETE_DEFAULT_FILE_FORMAT: &str = "write.delete.format.default";
126+
/// Default value for data file format
127+
pub const PROPERTY_DEFAULT_FILE_FORMAT_DEFAULT: &str = "parquet";
128+
129+
/// Target file size for newly written files.
130+
pub const PROPERTY_WRITE_TARGET_FILE_SIZE_BYTES: &str = "write.target-file-size-bytes";
131+
/// Default target file size
132+
pub const PROPERTY_WRITE_TARGET_FILE_SIZE_BYTES_DEFAULT: usize = 512 * 1024 * 1024; // 512 MB
133+
122134
/// Reference to [`TableMetadata`].
123135
pub type TableMetadataRef = Arc<TableMetadata>;
124136

crates/integrations/datafusion/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@ async-trait = { workspace = true }
3434
datafusion = { workspace = true }
3535
futures = { workspace = true }
3636
iceberg = { workspace = true }
37+
parquet = { workspace = true }
3738
tokio = { workspace = true }
39+
uuid = { workspace = true }
3840

3941
[dev-dependencies]
4042
expect-test = { workspace = true }

crates/integrations/datafusion/src/physical_plan/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ pub(crate) mod commit;
1919
pub(crate) mod expr_to_predicate;
2020
pub(crate) mod metadata_scan;
2121
pub(crate) mod scan;
22+
pub(crate) mod write;
2223

2324
pub(crate) const DATA_FILES_COL_NAME: &str = "data_files";
2425

0 commit comments

Comments
 (0)