Remove field id constraint on add files #2662

jeroko · 2025-10-27T09:51:03Z

Rationale for this change

The PR relaxes the constraint that prevented adding any file with field IDs, and replaces it with a constraint that prevents adding files which contain field IDs that are inconsistent with the field IDs of the table. If the field IDs are compatible, then they can be added safely, if not, they will be rejected.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Fokko · 2025-10-27T15:41:23Z

pyiceberg/io/pyarrow.py

+        requested_id_to_name = requested_schema._lazy_id_to_name
+        provided_id_to_name = provided_schema._lazy_id_to_name


Hey @jeroko Thanks for working on this, and adding this check.

However, I don't think we really care about the names; it is not a problem when they differ. However, if you add a file with a different schema, we can brick the table because of issues in the types. Should we check if the file contains the expected type for each of the IDs instead?

@Fokko Right, we should not care about the names if the IDs are provided, and the mapping between the IDs and the types was already checked in the call to _check_schema_compatible at the end of this function. In that case I didn't really need to add any extra check, just a new test to verify that files with matching field IDs and incompatible types fail.

…tibility

kevinjqliu

Thanks for the PR! This is a great addition. Added a few comments

kevinjqliu · 2025-11-04T19:01:57Z

pyiceberg/io/pyarrow.py

i think we should at least check that the parquet field IDs align with the Iceberg field IDs

kevinjqliu · 2025-11-04T19:03:31Z

mkdocs/docs/api.md

+    `add_files` can work with Parquet files both with and without field IDs in their metadata:
+    - **Files with field IDs**: When field IDs are present in the Parquet metadata, they must match the corresponding field IDs in the Iceberg table schema. This is common for files generated by tools like Spark or when using or other libraries with explicit field ID metadata.
+    - **Files without field IDs**: When field IDs are absent, the table must have a [Name Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization) to map field names to Iceberg field IDs. `add_files` will automatically create a Name Mapping based on the table's current schema if one doesn't already exist.
+    In both cases, a Name Mapping is created if the table doesn't have one, ensuring compatibility with various readers.


For parquet files with field ID, i dont think we necessary need the name mapping if its aligned with the table schema field IDs
But we can address this separately

Remove constraint for adding files with field IDs

677f196

jeroko force-pushed the remove-field_id-constraint-on-add_files branch 2 times, most recently from 0b599c6 to d580102 Compare October 27, 2025 14:00

Add constraint to avoid adding files with conflicting field IDs

1addf60

jeroko force-pushed the remove-field_id-constraint-on-add_files branch from d580102 to 1addf60 Compare October 27, 2025 14:31

jeroko marked this pull request as ready for review October 27, 2025 14:57

jeroko mentioned this pull request Oct 27, 2025

Add files support for parquet field_ids #2131

Open

Fokko reviewed Oct 27, 2025

View reviewed changes

remove name checks and rely only on ids when available to check compa…

7e39896

…tibility

jeroko requested a review from Fokko October 29, 2025 08:19

kevinjqliu reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove field id constraint on add files #2662

Remove field id constraint on add files #2662

Uh oh!

jeroko commented Oct 27, 2025

Uh oh!

Fokko Oct 27, 2025

Uh oh!

jeroko Oct 27, 2025 •

edited

Loading

Uh oh!

kevinjqliu left a comment

Uh oh!

kevinjqliu Nov 4, 2025

Uh oh!

kevinjqliu Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		requested_id_to_name = requested_schema._lazy_id_to_name
		provided_id_to_name = provided_schema._lazy_id_to_name

Remove field id constraint on add files #2662

Are you sure you want to change the base?

Remove field id constraint on add files #2662

Uh oh!

Conversation

jeroko commented Oct 27, 2025

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Fokko Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jeroko Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeroko Oct 27, 2025 •

edited

Loading