Skip to content

partition field names validation against schema field conflicts #2305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rutb327
Copy link

@rutb327 rutb327 commented Aug 11, 2025

Closes #2272
Collaborator: @geruh

Rationale for this change

Implements the validation logic described in #2272 to match Java and Rust behavior for partition field name conflicts with schema fields.
This mirrors the method in Java checkAndAddPartitionName():
https://github.com/apache/iceberg/blob/4dbc7f578eee7ceb9def35ebfa1a4cc236fb598f/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L392-L416

Identity transforms (sourceColumnID != null)- Allow schema field name conflicts only when sourced form the same field
Non-identity (sourceColumnID == null)- Disallow any schema field name conflicts.

In this PR isinstance(transform, (IdentityTransform, VoidTransform)) is used to achieve the same logic as Java’s sourceColumnID check.

Are these changes tested?

Yes, all existing tests pass and added a test covering validation scenarios.

Are there any user-facing changes?

Yes. Non-identity transforms can no longer use schema field names as partition field names.

@rutb327
Copy link
Author

rutb327 commented Aug 12, 2025

In Java all partition-schema validation goes through https://github.com/apache/iceberg/blob/4dbc7f578eee7ceb9def35ebfa1a4cc236fb598f/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L392-L416 during table creation with partition specs, partition spec updates and also during schema evolution.
In Python the validation in https://github.com/apache/iceberg-python/blob/d1c6005ad05166ab0fb08d3c15ccdfd7568e8013/pyiceberg/table/update/spec.py only covered partition spec updates
So, I've added the validation to:

Are these the correct locations for the validation logic, or should they be placed elsewhere?

Copy link
Contributor

@dingo4dev dingo4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

To improve readability and keep related code together, what are your thoughts on placing all the partition validation logic inside the partitioning.py file? Centralizing it there could make the validation process easier for future contributors to find and understand.

Let me know what you think! @kevinjqliu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] disallow creating partition field with name that conflicts with schema field when its not identity transform
2 participants