Skip to content

Commit 227c1c1

Browse files
committed
feat(datafusion): implement repartition node for DataFusion, setting the best partition strategy for Iceberg for writing
- Implement hash partitioning for partitioned/bucketed tables - Use round-robin partitioning for unpartitioned tables - Support range distribution mode approximation via sort columns
1 parent bc469c3 commit 227c1c1

File tree

3 files changed

+914
-1
lines changed

3 files changed

+914
-1
lines changed

crates/iceberg/src/table.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ use crate::inspect::MetadataTable;
2424
use crate::io::FileIO;
2525
use crate::io::object_cache::ObjectCache;
2626
use crate::scan::TableScanBuilder;
27-
use crate::spec::{TableMetadata, TableMetadataRef};
27+
use crate::spec::{SchemaRef, TableMetadata, TableMetadataRef};
2828
use crate::{Error, ErrorKind, Result, TableIdent};
2929

3030
/// Builder to create table scan.
@@ -235,6 +235,11 @@ impl Table {
235235
self.readonly
236236
}
237237

238+
/// Returns the current schema as a shared reference.
239+
pub fn schema_ref(&self) -> SchemaRef {
240+
self.metadata.current_schema().clone()
241+
}
242+
238243
/// Create a reader for the table.
239244
pub fn reader_builder(&self) -> ArrowReaderBuilder {
240245
ArrowReaderBuilder::new(self.file_io.clone())

crates/integrations/datafusion/src/physical_plan/mod.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,11 @@
1818
pub(crate) mod commit;
1919
pub(crate) mod expr_to_predicate;
2020
pub(crate) mod metadata_scan;
21+
pub(crate) mod repartition;
2122
pub(crate) mod scan;
2223
pub(crate) mod write;
2324

2425
pub(crate) const DATA_FILES_COL_NAME: &str = "data_files";
2526

27+
pub use repartition::IcebergRepartitionExec;
2628
pub use scan::IcebergTableScan;

0 commit comments

Comments
 (0)