-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[Test] fix(bigquery): handle field named "f" in tableRowFromMessage #36397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use setF method when field name is "f" to avoid IllegalArgumentException with internal fields. Add test case to verify the fix.
Summary of ChangesHello @liferoad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a specific issue within the BigQuery I/O connector where a field named "f" in a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Assigning reviewers: R: @kennknowles for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Abacn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably cannot use "setF" unless we know the order of fields at this moment. We either need to use setF for all field values, or wrap field name "f" to something like "__f" that workaround with TableRow limit (since this method is used for DLQ, it might be fine to change field name here)
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
Outdated
Show resolved
Hide resolved
Add integration test verifying BigQuery Storage API write with nested structures containing 'f' field. Tests fix for IllegalArgumentException when setting List field to Double in nested TableRow structures.
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
Show resolved
Hide resolved
Add PAssert to validate failed Storage API inserts in BigQueryNestedFFieldIT test Modify test to generate multiple test cases with different payload sizes Remove redundant test case for multiple nested structures
...cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryNestedFFieldIT.java
Outdated
Show resolved
Hide resolved
|
R: @baeminbo |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
Consolidate field processing in TableRowToStorageApiProto to always use setF with TableCells. This removes special handling for field 'f' and ensures consistent field ordering based on descriptor
Introduce a new flag to enable enhanced table row conversion in BigQuery Storage API writes. This provides better handling of nested fields and complex data types, particularly for cases involving fields named 'f'. The enhanced conversion avoids special handling of 'f' fields that could cause conflicts in nested structures. The change maintains backward compatibility by defaulting to the legacy behavior. When enabled, the new conversion logic processes fields in descriptor order and uses the F list format consistently. This addresses issues with nested structures containing 'f' fields while preserving existing functionality for other cases. Added integration tests to verify the enhanced conversion behavior with nested 'f' fields and backward compatibility tests to ensure existing pipelines continue to work as expected.
| * behavior | ||
| * @return the updated Write transform | ||
| */ | ||
| public Write<T> withEnhancedTableRowConversion(boolean useEnhancedTableRowConversion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users on new Beam version could still trigger the bug if not opt-in this PTransform configuration. Also, this is only used in DLQ in storage_write_api write methods. Personally prefer over #36425 in terms of fix, and the new tests added in this PR is valuable and could check in after fix merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks!
|
#36425 continues the work. |
Use setF method when field name is "f" to avoid IllegalArgumentException with internal fields. Add test case to verify the fix.
Addresses #33531
Stack Trace:
Tested:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.