Fix struct fieldid if missing in fileschema, read from expected(which… by lk255018 · Pull Request #17 · Teradata/iceberg

lk255018 · 2025-08-28T17:23:14Z

… is projected schema.

…f-1902

vimpan

Here is a code review of the changes from the pull request:

1. Changes in `TypeWithSchemaVisitor.java`

Summary:
The logic for determining the field ID when visiting fields in a Parquet GroupType has been improved to handle cases where field IDs are absent.

Comments:

The previous implementation defaulted to -1 if the field's ID was null.
The new implementation attempts to fetch the field ID from the corresponding Iceberg struct if available, falling back to -1 only if neither is present.
The introduction of fieldIdFromStruct as an index into struct.fields() is correct and ensures correspondence between Parquet fields and Iceberg struct fields.

The logic:

int id = field.getId() != null
    ? field.getId().intValue()
    : (struct != null) ? struct.fields().get(fieldIdFromStruct).fieldId() : -1;

is clear and improves robustness for schemas without explicit IDs.

Incrementing fieldIdFromStruct on each iteration ensures the mapping progresses correctly.

Suggestions:

Consider adding a comment explaining the correspondence assumption between group.getFields() and struct.fields(); this is implicit and could be a source of subtle bugs if the ordering ever diverges.
Unit tests for edge cases (e.g., mismatched field counts, unusual struct arrangements) would be valuable.

2. Changes in `TestParquet.java`

Summary:
A comprehensive test (testReadNestedStructWithoutId) has been added to verify reading nested struct data from a Parquet file when the schema lacks field IDs.

Comments:

The test builds a deeply nested Iceberg schema and a matching Avro schema (without IDs).
Helper methods for schema creation (createAvroSchemaWithoutIds), writing Parquet files, and building nested records are well-structured.
The test writes a single record to Parquet, reads it back using Iceberg, and validates the nested struct values.
Use of assertions with helpful fail messages makes the test readable and debuggable.
The test covers the scenario where field IDs are missing, which is the case addressed by the code change in TypeWithSchemaVisitor.java.

Suggestions:

Exception handling is good, but consider adding failure assertions if no record is read.
The utility methods for record/schemas could be moved to a shared test utility class if reused elsewhere.
The test currently writes and reads only a single record; consider testing multiple records and more complex nullability cases for completeness.

Overall Assessment

Correctness: The code change addresses a real edge case and the test verifies it effectively.
Readability: The changes are clear, well-organized, and easy to follow. Test code is verbose but justified due to the complexity of the scenario.
Coverage: The new test covers the main new code path.
Performance: No significant performance impact anticipated.

Approval Recommendation

LGTM (Looks Good To Me)
— Approve, pending minor documentation comments if desired.

If you have any specific areas of concern or want a deeper review of edge cases, let me know!

Jot something down

Fix struct fieldid if missing in fileschema, read from expected(which…

ddf18ab

… is projected schema.

github-actions bot added the PARQUET label Aug 28, 2025

lk255018 added 2 commits August 28, 2025 10:33

fix checkstyle

4507dab

Merge branch 'feature/teradata-apache-iceberg-1.9.0' into bugfixes/ot…

f52fe36

…f-1902

jb185048 approved these changes Aug 28, 2025

View reviewed changes

vimpan reviewed Aug 28, 2025

View reviewed changes

lk255018 added 2 commits August 28, 2025 13:11

fix checkstyle

2ac75c6

fix checkstyle

2e2b788

lk255018 merged commit af0a9c1 into feature/teradata-apache-iceberg-1.9.0 Aug 28, 2025
42 checks passed

lk255018 deleted the bugfixes/otf-1902 branch August 28, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix struct fieldid if missing in fileschema, read from expected(which…#17

Fix struct fieldid if missing in fileschema, read from expected(which…#17
lk255018 merged 5 commits intofeature/teradata-apache-iceberg-1.9.0from
bugfixes/otf-1902

lk255018 commented Aug 28, 2025

Uh oh!

vimpan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lk255018 commented Aug 28, 2025

Uh oh!

vimpan left a comment

Choose a reason for hiding this comment

1. Changes in TypeWithSchemaVisitor.java

2. Changes in TestParquet.java

Overall Assessment

Approval Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Changes in `TypeWithSchemaVisitor.java`

2. Changes in `TestParquet.java`