Skip to content

Fix struct fieldid if missing in fileschema, read from expected(which…#17

Merged
lk255018 merged 5 commits intofeature/teradata-apache-iceberg-1.9.0from
bugfixes/otf-1902
Aug 28, 2025
Merged

Fix struct fieldid if missing in fileschema, read from expected(which…#17
lk255018 merged 5 commits intofeature/teradata-apache-iceberg-1.9.0from
bugfixes/otf-1902

Conversation

@lk255018
Copy link

… is projected schema.

Copy link

@vimpan vimpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a code review of the changes from the pull request:


1. Changes in TypeWithSchemaVisitor.java

Summary:
The logic for determining the field ID when visiting fields in a Parquet GroupType has been improved to handle cases where field IDs are absent.

Comments:

  • The previous implementation defaulted to -1 if the field's ID was null.
  • The new implementation attempts to fetch the field ID from the corresponding Iceberg struct if available, falling back to -1 only if neither is present.
  • The introduction of fieldIdFromStruct as an index into struct.fields() is correct and ensures correspondence between Parquet fields and Iceberg struct fields.
  • The logic:
    int id = field.getId() != null
        ? field.getId().intValue()
        : (struct != null) ? struct.fields().get(fieldIdFromStruct).fieldId() : -1;
    is clear and improves robustness for schemas without explicit IDs.
  • Incrementing fieldIdFromStruct on each iteration ensures the mapping progresses correctly.

Suggestions:

  • Consider adding a comment explaining the correspondence assumption between group.getFields() and struct.fields(); this is implicit and could be a source of subtle bugs if the ordering ever diverges.
  • Unit tests for edge cases (e.g., mismatched field counts, unusual struct arrangements) would be valuable.

2. Changes in TestParquet.java

Summary:
A comprehensive test (testReadNestedStructWithoutId) has been added to verify reading nested struct data from a Parquet file when the schema lacks field IDs.

Comments:

  • The test builds a deeply nested Iceberg schema and a matching Avro schema (without IDs).
  • Helper methods for schema creation (createAvroSchemaWithoutIds), writing Parquet files, and building nested records are well-structured.
  • The test writes a single record to Parquet, reads it back using Iceberg, and validates the nested struct values.
  • Use of assertions with helpful fail messages makes the test readable and debuggable.
  • The test covers the scenario where field IDs are missing, which is the case addressed by the code change in TypeWithSchemaVisitor.java.

Suggestions:

  • Exception handling is good, but consider adding failure assertions if no record is read.
  • The utility methods for record/schemas could be moved to a shared test utility class if reused elsewhere.
  • The test currently writes and reads only a single record; consider testing multiple records and more complex nullability cases for completeness.

Overall Assessment

  • Correctness: The code change addresses a real edge case and the test verifies it effectively.
  • Readability: The changes are clear, well-organized, and easy to follow. Test code is verbose but justified due to the complexity of the scenario.
  • Coverage: The new test covers the main new code path.
  • Performance: No significant performance impact anticipated.

Approval Recommendation

LGTM (Looks Good To Me)
— Approve, pending minor documentation comments if desired.

If you have any specific areas of concern or want a deeper review of edge cases, let me know!

Jot something down

@lk255018 lk255018 merged commit af0a9c1 into feature/teradata-apache-iceberg-1.9.0 Aug 28, 2025
42 checks passed
@lk255018 lk255018 deleted the bugfixes/otf-1902 branch August 28, 2025 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants