fix: Fallback to field.name to get field position when PARQUET:field_id is unavailable #1561

CTTY · 2025-07-30T00:00:08Z

Which issue does this PR close?

Closes Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable #1560

What changes are included in this PR?

Fallback to use field.name to get field position
Minor typo fixes

Are these changes tested?

Yes, added uts

crates/iceberg/src/arrow/value.rs

liurenjie1024 · 2025-08-01T09:58:59Z

Let's discuss in this issue: #1560 (comment)

liurenjie1024 · 2025-08-06T10:32:58Z

crates/iceberg/src/arrow/value.rs

-                    .map(|id| id == field.id)
-                    .unwrap_or(false)
-            })
+            .position(|arrow_field| self.arrow_field_matches_id(arrow_field, field.id))


I think we still have misunderstanding here. As I have said #1560 (comment) , for the nan_val_cnt case, we should match field to arrow array using the field's position(e.g. the rank of this field in struct) rather than by id or name. It's not easy to accomplish with currentl interface, so we should change the interface of PartnerAccessor, see comments below.

crates/iceberg/src/spec/schema/visitor.rs

Fokko · 2025-08-06T13:23:01Z

@CTTY Thanks for raising this PR, as suggested in #1560 I think we need to rely on name-mapping here.

Iceberg is designed to do operations lazy, meaning that we typically don't rewrite data unless it is neccessary. For example, if you rename a field, it will still find the field with the original name using the field-IDs. This is the case when you have a new table that has the field-IDs correctly set.

However, in the age of big-data, and to make it easier for users to migrate to Iceberg, we also support importing existing Parquet files where the field-IDs are missing. In this case, we'll use name-mapping to map names to a field-ID. In the case of a rename, the old and the new name will map to the same ID, so we can still look up the field after a rename.

Co-authored-by: Florian Valeye <[email protected]>

Co-authored-by: Renjie Liu <[email protected]>

This reverts commit 8729155.

CTTY mentioned this pull request Jul 30, 2025

feat(arrow): Use field name for lookup when field_id in parquet is unavailable #1566

Closed

fvaleye reviewed Jul 31, 2025

View reviewed changes

crates/iceberg/src/arrow/value.rs Outdated Show resolved Hide resolved

CTTY mentioned this pull request Aug 4, 2025

Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable #1560

Open

liurenjie1024 reviewed Aug 6, 2025

View reviewed changes

CTTY and others added 11 commits August 12, 2025 09:55

use field name to find field pos when field id is unavailable

72d04a7

add ut

0918040

lol

c6bc506

Update crates/iceberg/src/arrow/value.rs

8f79616

Co-authored-by: Florian Valeye <[email protected]>

having fun with schema

1b2dbcb

clippy is strict

3dc59c2

I write bugs, I fix bugs

7818126

Update crates/iceberg/src/spec/schema/visitor.rs

8729155

Co-authored-by: Renjie Liu <[email protected]>

Revert "Update crates/iceberg/src/spec/schema/visitor.rs"

54bc5df

This reverts commit 8729155.

match mode

a33869d

enters match mode

3928ac5

CTTY force-pushed the ctty/name-pos branch from 8a49c96 to 3928ac5 Compare August 12, 2025 19:54

name matching in write exec

7636cd5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fallback to field.name to get field position when PARQUET:field_id is unavailable #1561

fix: Fallback to field.name to get field position when PARQUET:field_id is unavailable #1561

CTTY commented Jul 30, 2025

Uh oh!

Uh oh!

liurenjie1024 commented Aug 1, 2025

Uh oh!

liurenjie1024 Aug 6, 2025

Uh oh!

Uh oh!

Fokko commented Aug 6, 2025

Uh oh!

Uh oh!

fix: Fallback to field.name to get field position when PARQUET:field_id is unavailable #1561

Are you sure you want to change the base?

fix: Fallback to field.name to get field position when PARQUET:field_id is unavailable #1561

Conversation

CTTY commented Jul 30, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

Uh oh!

liurenjie1024 commented Aug 1, 2025

Uh oh!

liurenjie1024 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fokko commented Aug 6, 2025

Uh oh!

Uh oh!