Skip to content

Conversation

sgrebnov
Copy link

@sgrebnov sgrebnov commented Oct 3, 2025

Which issue does this PR close?

PR adds required_input_distribution setting for IcebergWriteExec to ensure DataFusion coalesces input partitions automatically before commit, fixing partitioning issues in Iceberg write operations (partial writes). Closes spiceai/spiceai#7407

Despite the following code, w/o proper IcebergWriteExec settings DF optimizer rewrites this resulting in the following physical plan as a result sometimes only first partition is handled.

Physical plan: 
IcebergCommitExec: table=test_namespace.test_table_partitioning
  RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=3
    IcebergWriteExec: table=test_namespace.test_table_partitioning
      DataSourceExec: partitions=3, partition_sizes=[1, 1, 1]

https://github.com/spiceai/iceberg-rust/blob/spiceai-0.7.0-rc1/crates/integrations/datafusion/src/table/mod.rs#L212-L225

        let write_plan = Arc::new(IcebergWriteExec::new(
            self.table.clone(),
            input,
            self.schema.clone(),
        ));

        // Merge the outputs of write_plan into one so we can commit all files together
        let coalesce_partitions = Arc::new(CoalescePartitionsExec::new(write_plan));

        Ok(Arc::new(IcebergCommitExec::new(
            self.table.clone(),
            catalog,
            coalesce_partitions,
            self.schema.clone(),
        )))

After

IcebergCommitExec: table=test_namespace.test_table
  CoalescePartitionsExec
    IcebergWriteExec: table=test_namespace.test_table
      DataSourceExec: partitions=3, partition_sizes=[1, 1, 1]";

What changes are included in this PR?

Are these changes tested?

@sgrebnov sgrebnov force-pushed the sgrebnov/1002-fix-partitioned-source-write branch from f6e5ebe to 69fe87a Compare October 3, 2025 06:24
@sgrebnov sgrebnov force-pushed the sgrebnov/1002-fix-partitioned-source-write branch from 69fe87a to 128acc8 Compare October 3, 2025 06:28
@sgrebnov
Copy link
Author

sgrebnov commented Oct 3, 2025

Upstream PR created: apache#1723

@sgrebnov sgrebnov self-assigned this Oct 3, 2025
@sgrebnov sgrebnov merged commit fabc471 into spiceai-0.7.0-rc1 Oct 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants