fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization #3051

mbutrovich · 2026-01-06T22:40:13Z

Which issue does this PR close?

N/A

Rationale for this change

Profiling Iceberg native scans revealed significant overhead in async stream polling, particularly:

tokio::drop_waker and tokio::park::clone consuming substantial time in IcebergStreamWrapper::poll_next
futures_util::stream::flatten_unordered::SharedPollState::{start_polling,stop_polling} showing lock contention

I think this is due to:

Per-batch schema adapter allocation: Created SparkParquetOptions, SparkSchemaAdapterFactory, and schema adapters for every single batch via .and_then() combinator
Competing parallelization logic: IcebergFileStream passed one FileScanTask at a time to iceberg-rust, causing flatten_unordered to coordinate parallelization across a single task (pure overhead). Stream nesting created excessive waker churn.

What changes are included in this PR?

Cache schema adapters and Parquet options
Remove IcebergFileStream and pass all FileScanTasks directly to iceberg-rust. I tried this in the past but I can't remember why I abandoned it. Let's try again.

How are these changes tested?

Existing tests.

…. Cache SchemaAdapter in IcebergScan.

codecov-commenter · 2026-01-06T23:31:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.54%. Comparing base (f09f8af) to head (fe72369).
⚠️ Report is 830 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3051      +/-   ##
============================================
+ Coverage     56.12%   59.54%   +3.42%     
- Complexity      976     1379     +403     
============================================
  Files           119      167      +48     
  Lines         11743    15485    +3742     
  Branches       2251     2573     +322     
============================================
+ Hits           6591     9221    +2630     
- Misses         4012     4966     +954     
- Partials       1140     1298     +158

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Conflicts: # native/core/src/execution/operators/iceberg_scan.rs

mbutrovich · 2026-01-07T14:40:39Z

The remaining failures have to do with ordering issues on queries that don't have an ORDER BY. iceberg-rust does a flatten_unordered on the scan tasks, so you can get different ordering on scans than what Iceberg Java planned. The resulting behavior is ultimately correct since SELECT results without an ORDER BY are non-deterministic, but I'll wait to see performance improvements before we continue down this path.

mbutrovich added 3 commits January 6, 2026 16:26

Fix granularity of metrics updates in IcebergFileStream.

ef9af62

Stash caching the schema adapter.

01738dd

Rely on iceberg-rust parallelization and get rid of IcebergFileStream…

1a3066c

…. Cache SchemaAdapter in IcebergScan.

mbutrovich marked this pull request as draft January 6, 2026 22:41

Fix machete

62dc943

mbutrovich added 2 commits January 6, 2026 19:26

Update metrics test.

a5fcd0e

Merge branch 'main' into more_more_iceberg_file_stream

63a0b72

# Conflicts: # native/core/src/execution/operators/iceberg_scan.rs

mbutrovich changed the title ~~fix: [WIP] [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization~~ fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization Jan 7, 2026

Fix schema adapter caching issue.

fe72369

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization #3051

fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization #3051

mbutrovich commented Jan 6, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 6, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization #3051

Are you sure you want to change the base?

fix: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization #3051

Conversation

mbutrovich commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mbutrovich commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mbutrovich commented Jan 6, 2026 •

edited

Loading

codecov-commenter commented Jan 6, 2026 •

edited

Loading

mbutrovich commented Jan 7, 2026 •

edited

Loading