Skip to content

ArrowReader enhancements for Apache DataFusion CometΒ #1749

@mbutrovich

Description

@mbutrovich

What's the feature are you trying to implement?

Apache DataFusion Comet is an Apache Spark accelerator with Apache Iceberg support. We would like to enhance that support by leveraging Iceberg-Rust. You can find the details of this effort in the POC PR apache/datafusion-comet#2528 and in slides presented at the 10/9/25 Iceberg-Rust community call.

The short version is that Comet will rely on Apache Iceberg's Java integration with Apache Spark for planning, and then pass those generated FileScanTasks to Iceberg-Rust via a new DataFusion IcebergScan operator in Comet. We need a lot of new (or just public) APIs in the ArrowReader since we are bypassing the Table interface to avoid redundant (and possibly incorrect partitioned) planning. I will start to accumulate those efforts here.

  • Make ArrowReaderBuilder::new pub instead of pub(crate).
  • Expose decryption options in ArrowReaderBuilder. This likely requires a new Iceberg-Rust Cargo feature like in DataFusion to enable the encryption feature for the Parquet crate.
  • Expose ArrowReaderOptions in ArrowReaderBuilder.

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions