-
Notifications
You must be signed in to change notification settings - Fork 388
fix: Reserved sort order ID cannot contain any fields #1978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Reserved sort order ID cannot contain any fields #1978
Conversation
|
@CTTY: Please review this PR when you get a chance, thanks. |
| if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID) | ||
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fail, or do we want to drop the fields and emit a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't allow non-empty fields for reserved sort order ID: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/SortOrder.java#L269
IMO it feels safer to just report an error to the client that there's something "fishy" about their metadata. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think failing here is better and more consistent to Java's behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I also think it's more reasonable to fail it.
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm not very familiar with the code path and was wondering why we need to use try_normalize to do the validation when parsing TableMetadata. Looking at iceberg-java's TableMetadataParser, it just delegates the validation to sub-parsers like PartitionSpecParser and SortOrderBuilder, which I think is a much cleaner pattern so we don't have to rewrite the validation logic. In rust, we probably need to implement deserializer for SortOrder, PartitionSpec and so on individually to adopt this pattern.
This concern should not block this PR tho
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( | ||
| ErrorKind::DataInvalid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should be ErrorKind::Unexpected according to https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/sort.rs#L157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, updated in a new commit as part of the PR
| if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID) | ||
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think failing here is better and more consistent to Java's behavior
b878844 to
3fdc6ca
Compare
- Closes apache#1963. This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values. Added the unit test described in the issue and verified that the check is now enforced.
Addresses a PR review comment to use the right error type that is aligned with the sort spec implementation
214e93a to
e0ce273
Compare
|
Hi @Fokko , please review when you get a chance. Thanks! |
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aditya-subrahmanyan for this fix!
| if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID) | ||
| && !sort_order.fields.is_empty() | ||
| { | ||
| return Err(Error::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I also think it's more reasonable to fail it.
## Which issue does this PR close? - Closes apache#1963. ## What changes are included in this PR? This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values. ## Are these changes tested? Added the unit test described in the issue and verified that the check is now enforced.
#30) * Merge remote-tracking branch 'upstream/main' into gb/merge-upstream-arrow-57.1 * Fix merge mistakes * . * . * . * . * . * . * . * . * . * . * format * Disable python bindings again * Fix clippy errors * Enable tests * Fix merge mistake * . * . * . * Clippy fix * fix: Reserved sort order ID cannot contain any fields (apache#1978) ## Which issue does this PR close? - Closes apache#1963. ## What changes are included in this PR? This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values. ## Are these changes tested? Added the unit test described in the issue and verified that the check is now enforced. * feat(datafusion): Add LIMIT pushdown support (apache#2006) Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows. Changes: - Add limit field to IcebergTableScan to track row limit - Apply limit at stream level by filtering/slicing record batches - Update IcebergTableProvider and IcebergStaticTableProvider to pass limit parameter to scan - Add comprehensive tests for limit pushdown functionality ## Which issue does this PR close? - Closes #. ## What changes are included in this PR? ## Are these changes tested? Co-authored-by: Claude Sonnet 4.5 <[email protected]> * Redo comment * Add case-sensitive attribute to incremental scan for consistency --------- Co-authored-by: Aditya Subrahmanyan <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Claude Sonnet 4.5 <[email protected]>
Which issue does this PR close?
What changes are included in this PR?
This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values.
Are these changes tested?
Added the unit test described in the issue and verified that the check is now enforced.