Skip to content

Fix TypeError in avro_union_type_to_beam_type with complex union types#35459

Merged
damccorm merged 1 commit intoapache:masterfrom
RRap0so:fix-dict-support-in-avroio
Jul 1, 2025
Merged

Fix TypeError in avro_union_type_to_beam_type with complex union types#35459
damccorm merged 1 commit intoapache:masterfrom
RRap0so:fix-dict-support-in-avroio

Conversation

@RRap0so
Copy link
Contributor

@RRap0so RRap0so commented Jun 27, 2025

Fix TypeError in avro_union_type_to_beam_type with complex union types

Fixes: #35462

What

The avro_union_type_to_beam_type function fails with TypeError: unhashable type: 'dict' when processing union types containing complex Avro types (like records) with null.

The function attempts to check if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES without verifying that avro_type is hashable. When avro_type is a record (dict), this causes a TypeError since dicts cannot be used as dictionary keys.

Added isinstance(avro_type, str) check before the dictionary lookup to ensure only string primitive types are checked against AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES. Complex types now correctly fall through to return Any type.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@RRap0so RRap0so changed the title Fix dict type Fix TypeError in avro_union_type_to_beam_type with complex union types Jun 27, 2025
@RRap0so RRap0so marked this pull request as ready for review June 27, 2025 14:53
if len(union_type) == 2 and "null" in union_type:
for avro_type in union_type:
if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES:
if isinstance(avro_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe someone familiar with the Python SDK can answer what the expected behavior should be for an annotated string?

for example, our datasets have some types produced by avro SDKs, that look like:

{
  "name": "myField",
  "type": ["null",
      {
        "avro.java.string": "String",
        "type": "string"
      }
   ]
}

currently these are unreadable in PyBeam; they throw the error:

INFO:   File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 170, in __init__
INFO:     beam_schema = avro_schema_to_beam_schema(avro_schema)
INFO:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:   File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 607, in avro_schema_to_beam_schema
INFO:     beam_type = avro_type_to_beam_type(avro_schema)
INFO:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:   File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 598, in avro_type_to_beam_type
INFO:     f['name'], avro_type_to_beam_type(f['type']))
INFO:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:   File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 576, in avro_type_to_beam_type
INFO:     return avro_union_type_to_beam_type(avro_type)
INFO:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO:   File "/usr/src/app/.venv/lib/python3.12/site-packages/apache_beam/io/avroio.py", line 563, in avro_union_type_to_beam_type
INFO:     if avro_type in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES:
INFO:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: TypeError: unhashable type: 'dict'

should these types resolve to being a primitive String, or an Any?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I think this would resolve to an optional string. An improved fix might be to call avro_type_to_beam_type instead of returning a FieldType here

return schema_pb2.FieldType(

we would just need to pass through an additional optional field through that function to make things function well.

type_name = avro_type['type']
if type_name in AVRO_PRIMITIVES_TO_BEAM_PRIMITIVES:
return schema_pb2.FieldType(

should take care of the rest at that point.

With that said, this PR is an improvement over the current state of things. So @RRap0so if you don't want to do the work needed for that piece we can take it as a future improvement, just let me know what you would prefer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @damccorm,

Thanks for the comment. I took a simpler approach but I think it solve the issue.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@RRap0so RRap0so force-pushed the fix-dict-support-in-avroio branch 2 times, most recently from bf2606c to dc9aaba Compare June 27, 2025 22:49
@liferoad liferoad requested review from damccorm and tvalentyn June 29, 2025 00:37
@RRap0so RRap0so force-pushed the fix-dict-support-in-avroio branch 2 times, most recently from 97d1656 to 151cb8c Compare July 1, 2025 09:30
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

Assigning reviewers:

R: @tvalentyn for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this!

Comment on lines +559 to +561
if beam_type.WhichOneof("type_info") == "atomic_type":
return schema_pb2.FieldType(
atomic_type=beam_type.atomic_type, nullable=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably just don't need this if branch at this point, right?

The tests look good, so if removing this causes them to fail we can leave it for now, but I'd expect the logic after the if to do the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, completely slipped. I've removed and tests seem happy locally. Lets see what CI says.

@RRap0so RRap0so force-pushed the fix-dict-support-in-avroio branch from cd292aa to 340249c Compare July 1, 2025 18:24
@damccorm
Copy link
Contributor

damccorm commented Jul 1, 2025

There are some test failures, but they look like known quota issues with our test suites which are being separately worked on.

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks perfect!

@damccorm damccorm merged commit 27085de into apache:master Jul 1, 2025
89 of 91 checks passed
@RRap0so RRap0so deleted the fix-dict-support-in-avroio branch July 1, 2025 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TypeError: unhashable type: 'dict' in avroio.py

3 participants