Make @doc annotation Serializable for Beam DoFn compatibility#5881
Open
spkrka wants to merge 3 commits intospotify:mainfrom
Open
Make @doc annotation Serializable for Beam DoFn compatibility#5881spkrka wants to merge 3 commits intospotify:mainfrom
spkrka wants to merge 3 commits intospotify:mainfrom
Conversation
The @doc annotation is used to add documentation metadata to Avro schemas generated via AvroType.toSchema. However, when AvroType instances are captured in Beam DoFn closures (e.g., when converting case classes to GenericRecord inside a transform), the @doc annotation must be serializable. This change adds 'with Serializable' to the doc annotation class, making it compatible with Apache Beam's DoFn serialization requirements. Use case: SMBCollection API needs to convert case classes to GenericRecord inside transform DoFns to preserve shuffle-free execution. Without this fix, any case class using @doc annotations fails with: java.io.NotSerializableException: com.spotify.scio.avro.types.package$doc This is a minimal, backward-compatible change that has no impact on existing functionality - it only enables new use cases that require serializing AvroType instances.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5881 +/- ##
==========================================
- Coverage 61.53% 61.52% -0.02%
==========================================
Files 317 317
Lines 11663 11662 -1
Branches 850 869 +19
==========================================
- Hits 7177 7175 -2
- Misses 4486 4487 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This test verifies that the @doc annotation from scio-avro is Serializable and can be used with Magnolify's AvroType in map operations that get serialized as part of Beam DoFns. The test includes: - DocAnnotationMapJob: Uses @AvroType.toSchema + @doc, creates records in .map(), converts with avroType.to - this would fail without the Serializable fix - NoDocAnnotationMapJob: Control test without @doc - always works Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The standard Scio saveAsAvroFile API uses AvroMagnolifyTypedIO internally, which also uses .map(avroT.to). This captures the AvroType (including @doc annotations) in a closure that gets serialized. This test verifies that the standard API also requires @doc to be Serializable, confirming that making @doc Serializable is the correct fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The @doc annotation is used to add documentation metadata to Avro schemas
generated via AvroType.toSchema. However, when AvroType instances are
captured in Beam DoFn closures (e.g., when converting case classes to
GenericRecord inside a transform), the @doc annotation must be serializable.
This change adds 'with Serializable' to the doc annotation class, making
it compatible with Apache Beam's DoFn serialization requirements.
Use case: SMBCollection API needs to convert case classes to GenericRecord
inside transform DoFns to preserve shuffle-free execution. Without this
fix, any case class using @doc annotations fails with:
java.io.NotSerializableException: com.spotify.scio.avro.types.package$doc
This is a minimal, backward-compatible change that has no impact on
existing functionality - it only enables new use cases that require
serializing AvroType instances.