Fix TypeError in avro_union_type_to_beam_type with complex union types#35459
Fix TypeError in avro_union_type_to_beam_type with complex union types#35459damccorm merged 1 commit intoapache:masterfrom
Conversation
sdks/python/apache_beam/io/avroio.py
Outdated
| if len(union_type) == 2 and "null" in union_type: | ||
| for avro_type in union_type: | ||
| if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES: | ||
| if isinstance(avro_type, |
There was a problem hiding this comment.
Maybe someone familiar with the Python SDK can answer what the expected behavior should be for an annotated string?
for example, our datasets have some types produced by avro SDKs, that look like:
{
"name": "myField",
"type": ["null",
{
"avro.java.string": "String",
"type": "string"
}
]
}currently these are unreadable in PyBeam; they throw the error:
INFO: File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 170, in __init__
INFO: beam_schema = avro_schema_to_beam_schema(avro_schema)
INFO: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 607, in avro_schema_to_beam_schema
INFO: beam_type = avro_type_to_beam_type(avro_schema)
INFO: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 598, in avro_type_to_beam_type
INFO: f['name'], avro_type_to_beam_type(f['type']))
INFO: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 576, in avro_type_to_beam_type
INFO: return avro_union_type_to_beam_type(avro_type)
INFO: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 563, in avro_union_type_to_beam_type
INFO: if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES:
INFO: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: TypeError: unhashable type: 'dict'
should these types resolve to being a primitive String, or an Any?
There was a problem hiding this comment.
Ideally, I think this would resolve to an optional string. An improved fix might be to call avro_type_to_beam_type instead of returning a FieldType here
beam/sdks/python/apache_beam/io/avroio.py
Line 559 in dc9aaba
we would just need to pass through an additional optional field through that function to make things function well.
beam/sdks/python/apache_beam/io/avroio.py
Lines 572 to 574 in dc9aaba
should take care of the rest at that point.
With that said, this PR is an improvement over the current state of things. So @RRap0so if you don't want to do the work needed for that piece we can take it as a future improvement, just let me know what you would prefer
There was a problem hiding this comment.
Hey @damccorm,
Thanks for the comment. I took a simpler approach but I think it solve the issue.
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
bf2606c to
dc9aaba
Compare
97d1656 to
151cb8c
Compare
|
Assigning reviewers: R: @tvalentyn for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
151cb8c to
cd292aa
Compare
damccorm
left a comment
There was a problem hiding this comment.
Thanks for working through this!
sdks/python/apache_beam/io/avroio.py
Outdated
| if beam_type.WhichOneof("type_info") == "atomic_type": | ||
| return schema_pb2.FieldType( | ||
| atomic_type=beam_type.atomic_type, nullable=True) |
There was a problem hiding this comment.
I think we probably just don't need this if branch at this point, right?
The tests look good, so if removing this causes them to fail we can leave it for now, but I'd expect the logic after the if to do the same thing
There was a problem hiding this comment.
Oh, completely slipped. I've removed and tests seem happy locally. Lets see what CI says.
cd292aa to
340249c
Compare
|
There are some test failures, but they look like known quota issues with our test suites which are being separately worked on. |
damccorm
left a comment
There was a problem hiding this comment.
Thanks, this looks perfect!
Fix TypeError in avro_union_type_to_beam_type with complex union types
Fixes: #35462
What
The
avro_union_type_to_beam_typefunction fails withTypeError: unhashable type: 'dict'when processing union types containing complex Avro types (like records) with null.The function attempts to check
if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVESwithout verifying thatavro_typeis hashable. Whenavro_typeis a record (dict), this causes a TypeError since dicts cannot be used as dictionary keys.Added
isinstance(avro_type, str)check before the dictionary lookup to ensure only string primitive types are checked againstAVRO_PRIMITIVES_TO_BEAM_PRIMITIVES. Complex types now correctly fall through to returnAnytype.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.