Skip to content

feat(datafusion): Implement insert_into for IcebergTableProvider #1600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

CTTY
Copy link
Contributor

@CTTY CTTY commented Aug 13, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Added catalog to IcebergTableProvider as optional
  • Added table refresh logic in IcebergTableProvider::scan
  • Implement insert_into for IcebergTableProvider using write node and commit node for non-partitioned tables

Are these changes tested?

Added tests

} else {
self.table.clone()
};

Copy link
Contributor Author

@CTTY CTTY Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if storing table directly in the IcebergTableProvider is correct... We could get a stale table if the provider doesn't have a catalog table.

Iceberg-java has a refresh() interface which uses TableOperation to refresh metadata. In iceberg-rs we don't have TableOperation and need to rely on catalog to refresh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR to add that functionality (including the refresh): #1297

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @phillipleblanc , thanks for pointing me to your change! Your change makes sense to me, but I was thinking of adding something like this to impl Table directly

pub async fn refresh(&mut self, catalog: &dyn Catalog) -> Result<Self>

@CTTY CTTY marked this pull request as ready for review August 14, 2025 19:29
_insert_op: InsertOp,
) -> DFResult<Arc<dyn ExecutionPlan>> {
if !self
.table
Copy link
Contributor Author

@CTTY CTTY Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should refresh the table here and every otherself.table usages in IcebergTableProvider, but I think we should fix that in a separate PR if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants