Skip to content

Conversation

@aditya-subrahmanyan
Copy link
Contributor

Which issue does this PR close?

What changes are included in this PR?

This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values.

Are these changes tested?

Added the unit test described in the issue and verified that the check is now enforced.

@aditya-subrahmanyan aditya-subrahmanyan marked this pull request as ready for review January 2, 2026 19:51
@aditya-subrahmanyan
Copy link
Contributor Author

@CTTY: Please review this PR when you get a chance, thanks.

if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID)
&& !sort_order.fields.is_empty()
{
return Err(Error::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fail, or do we want to drop the fields and emit a warning?

Copy link
Contributor Author

@aditya-subrahmanyan aditya-subrahmanyan Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't allow non-empty fields for reserved sort order ID: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/SortOrder.java#L269

IMO it feels safer to just report an error to the client that there's something "fishy" about their metadata. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think failing here is better and more consistent to Java's behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I also think it's more reasonable to fail it.

Copy link
Collaborator

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'm not very familiar with the code path and was wondering why we need to use try_normalize to do the validation when parsing TableMetadata. Looking at iceberg-java's TableMetadataParser, it just delegates the validation to sub-parsers like PartitionSpecParser and SortOrderBuilder, which I think is a much cleaner pattern so we don't have to rewrite the validation logic. In rust, we probably need to implement deserializer for SortOrder, PartitionSpec and so on individually to adopt this pattern.

This concern should not block this PR tho

&& !sort_order.fields.is_empty()
{
return Err(Error::new(
ErrorKind::DataInvalid,
Copy link
Collaborator

@CTTY CTTY Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be ErrorKind::Unexpected according to https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/sort.rs#L157

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, updated in a new commit as part of the PR

if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID)
&& !sort_order.fields.is_empty()
{
return Err(Error::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think failing here is better and more consistent to Java's behavior

@aditya-subrahmanyan aditya-subrahmanyan force-pushed the adityasn/reserved-sort-order-id-validation branch from b878844 to 3fdc6ca Compare January 5, 2026 21:07
- Closes apache#1963.

This change validates that table metadata with reserved sort order ID
(0) cannot contain fields associated with it. If this is found, we error
out instead of silently parsing arbitrary field values.

Added the unit test described in the issue and verified that the check
is now enforced.
Addresses a PR review comment to use the right error type that is aligned with the sort spec implementation
@aditya-subrahmanyan aditya-subrahmanyan force-pushed the adityasn/reserved-sort-order-id-validation branch from 214e93a to e0ce273 Compare January 6, 2026 04:43
@aditya-subrahmanyan
Copy link
Contributor Author

Hi @Fokko , please review when you get a chance. Thanks!

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aditya-subrahmanyan for this fix!

if let Some(sort_order) = self.sort_order_by_id(SortOrder::UNSORTED_ORDER_ID)
&& !sort_order.fields.is_empty()
{
return Err(Error::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I also think it's more reasonable to fail it.

@liurenjie1024 liurenjie1024 merged commit 43f1ed8 into apache:main Jan 9, 2026
17 checks passed
@aditya-subrahmanyan aditya-subrahmanyan deleted the adityasn/reserved-sort-order-id-validation branch January 9, 2026 04:54
gbrgr pushed a commit to RelationalAI/iceberg-rust that referenced this pull request Jan 9, 2026
## Which issue does this PR close?

- Closes apache#1963.

## What changes are included in this PR?

This change validates that table metadata with reserved sort order ID
(0) cannot contain fields associated with it. If this is found, we error
out instead of silently parsing arbitrary field values.

## Are these changes tested?

Added the unit test described in the issue and verified that the check
is now enforced.
gbrgr added a commit to RelationalAI/iceberg-rust that referenced this pull request Jan 9, 2026
#30)

* Merge remote-tracking branch 'upstream/main' into gb/merge-upstream-arrow-57.1

* Fix merge mistakes

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* format

* Disable python bindings again

* Fix clippy errors

* Enable tests

* Fix merge mistake

* .

* .

* .

* Clippy fix

* fix: Reserved sort order ID cannot contain any fields (apache#1978)

## Which issue does this PR close?

- Closes apache#1963.

## What changes are included in this PR?

This change validates that table metadata with reserved sort order ID
(0) cannot contain fields associated with it. If this is found, we error
out instead of silently parsing arbitrary field values.

## Are these changes tested?

Added the unit test described in the issue and verified that the check
is now enforced.

* feat(datafusion): Add LIMIT pushdown support (apache#2006)

Implement LIMIT pushdown to optimize queries with LIMIT clauses by
stopping data processing once the limit is reached. This reduces
unnecessary I/O and computation for queries that only need a subset of
rows.

Changes:
- Add limit field to IcebergTableScan to track row limit
- Apply limit at stream level by filtering/slicing record batches
- Update IcebergTableProvider and IcebergStaticTableProvider to pass
limit parameter to scan
- Add comprehensive tests for limit pushdown functionality

## Which issue does this PR close?


- Closes #.

## What changes are included in this PR?


## Are these changes tested?

Co-authored-by: Claude Sonnet 4.5 <[email protected]>

* Redo comment

* Add case-sensitive attribute to incremental scan for consistency

---------

Co-authored-by: Aditya Subrahmanyan <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing validation for reserved sort order ID 0 during metadata JSON parsing

4 participants