[Bug]: Improve performance of the Iceberg AddFiles transform

### What happened?

Currently Iceberg AddFiles transform has some performance bottlenecks when we try to write a large number of files. For example, we fully read parquet files being written [1] which can significantly slow down the process. We should look into improving the single VM performance without compromising consistency guarantees of the sink.

[1] https://github.com/apache/beam/blob/e08e9d56e5ee8ece43cc15967d0edff107651554/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/AddFiles.java#L685

### Issue Priority

Priority: 1 (data loss / total loss of function)

### Issue Components

- [ ] Component: Python SDK
- [x] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Improve performance of the Iceberg AddFiles transform #38012

What happened?

Issue Priority

Issue Components

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Improve performance of the Iceberg AddFiles transform #38012

Description

What happened?

Issue Priority

Issue Components

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions