Fix struct fieldid if missing in fileschema, read from expected(which…#17
Merged
lk255018 merged 5 commits intofeature/teradata-apache-iceberg-1.9.0from Aug 28, 2025
Merged
Conversation
… is projected schema.
jb185048
approved these changes
Aug 28, 2025
vimpan
reviewed
Aug 28, 2025
vimpan
left a comment
There was a problem hiding this comment.
Here is a code review of the changes from the pull request:
1. Changes in TypeWithSchemaVisitor.java
Summary:
The logic for determining the field ID when visiting fields in a Parquet GroupType has been improved to handle cases where field IDs are absent.
Comments:
- The previous implementation defaulted to
-1if the field's ID wasnull. - The new implementation attempts to fetch the field ID from the corresponding Iceberg struct if available, falling back to
-1only if neither is present. - The introduction of
fieldIdFromStructas an index intostruct.fields()is correct and ensures correspondence between Parquet fields and Iceberg struct fields. - The logic:
is clear and improves robustness for schemas without explicit IDs.
int id = field.getId() != null ? field.getId().intValue() : (struct != null) ? struct.fields().get(fieldIdFromStruct).fieldId() : -1;
- Incrementing
fieldIdFromStructon each iteration ensures the mapping progresses correctly.
Suggestions:
- Consider adding a comment explaining the correspondence assumption between
group.getFields()andstruct.fields(); this is implicit and could be a source of subtle bugs if the ordering ever diverges. - Unit tests for edge cases (e.g., mismatched field counts, unusual struct arrangements) would be valuable.
2. Changes in TestParquet.java
Summary:
A comprehensive test (testReadNestedStructWithoutId) has been added to verify reading nested struct data from a Parquet file when the schema lacks field IDs.
Comments:
- The test builds a deeply nested Iceberg schema and a matching Avro schema (without IDs).
- Helper methods for schema creation (
createAvroSchemaWithoutIds), writing Parquet files, and building nested records are well-structured. - The test writes a single record to Parquet, reads it back using Iceberg, and validates the nested struct values.
- Use of assertions with helpful fail messages makes the test readable and debuggable.
- The test covers the scenario where field IDs are missing, which is the case addressed by the code change in
TypeWithSchemaVisitor.java.
Suggestions:
- Exception handling is good, but consider adding failure assertions if no record is read.
- The utility methods for record/schemas could be moved to a shared test utility class if reused elsewhere.
- The test currently writes and reads only a single record; consider testing multiple records and more complex nullability cases for completeness.
Overall Assessment
- Correctness: The code change addresses a real edge case and the test verifies it effectively.
- Readability: The changes are clear, well-organized, and easy to follow. Test code is verbose but justified due to the complexity of the scenario.
- Coverage: The new test covers the main new code path.
- Performance: No significant performance impact anticipated.
Approval Recommendation
LGTM (Looks Good To Me)
— Approve, pending minor documentation comments if desired.
If you have any specific areas of concern or want a deeper review of edge cases, let me know!
Jot something down
af0a9c1
into
feature/teradata-apache-iceberg-1.9.0
42 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… is projected schema.