Skip to content

Datafusion with IcebergTableScan's performance is much slower than DeltaLake #1864

@Smith-Cruise

Description

@Smith-Cruise

Is your feature request related to a problem or challenge?

I'm using DataFusion to integrate iceberg-rust and delta-rs. However, I found that the performance of iceberg-rust is not ideal.

I've built a simple benchmark on my machine, TPCH SF100 parquet lineitem fully scan.

select * from catalog.tpch_sf100.lineitem order by 1 limit 1;

iceberg-rust: 104s
delta-rs: 40s

In the iceberg-rust, it looks like we can't parallel scan. The CPU's utilization is very low. Sometimes, only one core is 100%.

Could anyone help me find the root cause?

Describe the solution you'd like

No response

Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions