-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[#37209] Enhance serialization error messages for better developer experience #37298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Improved error messages when user code fails to serialize (pickle) for distributed execution. The original error was too technical and didn't explain the cause or suggest fixes. Changes: - Enhanced RuntimeError message with clear explanation of why serialization is required - Added common causes (lambdas capturing file handles, DB connections, thread locks) - Provided three concrete fixes: module-level functions, setup() methods, checking closure captures - Broadened exception catching to include TypeError and other pickling failures (not just RuntimeError) - Added exception chaining (from e) to preserve original stack trace - Added test case to verify the new error message content This significantly improves developer experience when debugging serialization issues, especially for new Apache Beam users. Fixes apache#37209 Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Summary of ChangesHello @PDGGK, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the developer experience in Apache Beam by transforming cryptic serialization error messages into clear, actionable guidance. By providing detailed explanations of why serialization is required, common pitfalls like capturing non-serializable objects, and concrete solutions, it aims to drastically reduce the debugging time for users encountering these issues, particularly those new to distributed execution paradigms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Fix Python formatter precommit check by applying yapf v0.43.0 formatting rules to modified files.
|
Assigning reviewers: R: @claudevdm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
What changes are being proposed in this pull request?
This PR addresses issue #37209 by significantly improving error messages when user code fails to serialize (pickle) for distributed execution.
Why are these changes needed?
Currently, when users pass non-serializable lambdas or closures (e.g., capturing a file handle or database connection), they get cryptic low-level errors like:
This doesn't explain:
This is especially frustrating for new Apache Beam users who don't understand distributed execution requirements.
Changes made:
1. Enhanced error message (
ptransform.py)The new error message includes:
2. Broader exception handling
Changed from catching only
RuntimeErrorto(RuntimeError, TypeError, Exception)because:TypeErrororPicklingError3. Exception chaining
Added
from eto preserve the original exception context and stack trace for debugging.4. Test coverage
Added
test_callable_non_serializable_error_message()to verify:Testing
ptransform_test.pyImpact
Example
Before:
After:
Fixes #37209