Skip to content

Conversation

CTTY
Copy link
Contributor

@CTTY CTTY commented Aug 6, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Added IcebergWriteExec to write the input execution plan to parquet files, and returns serialized data files

Are these changes tested?

added ut

@CTTY CTTY marked this pull request as ready for review August 7, 2025 00:23
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, in generally look good! Just one minor nit.

}

impl IcebergWriteExec {
pub fn new(table: Table, input: Arc<dyn ExecutionPlan>, schema: ArrowSchemaRef) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that we should ensure that the input schema matches table's schema, otherwise we are doing schema evolution during write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns nullability and field type would be checked within execute_input_stream when it's binding the Iceberg table schema to the input RecordBatch. So we don't need to worry about it now.

This may prevent us from doing any forms of schema evolution, but I think that's a separate issue

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, LGTM!

@liurenjie1024 liurenjie1024 merged commit bc469c3 into apache:main Aug 12, 2025
18 checks passed
@CTTY CTTY deleted the ctty/df-write-node branch August 12, 2025 16:34
Yiyang-C pushed a commit to Yiyang-C/iceberg-rust that referenced this pull request Aug 26, 2025
…port (apache#1585)

## Which issue does this PR close?

- Closes apache#1545
- See the original draft PR: apache#1511 

## What changes are included in this PR?
- Added `IcebergWriteExec` to write the input execution plan to parquet
files, and returns serialized data files


## Are these changes tested?
added ut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Writer Node: Spawn Iceberg writers and write the input data
2 participants