Skip to content

Conversation

@PDGGK
Copy link

@PDGGK PDGGK commented Jan 18, 2026

Description

Fixes #37176

The parseAndThrow method in Call.java was wrapping retryable exceptions (UserCodeTimeoutException, UserCodeRemoteSystemException) in a generic UserCodeExecutionException, which breaks the retry logic that depends on exception.shouldRepeat() returning true.

Problem

When user code throws a UserCodeTimeoutException or UserCodeRemoteSystemException (which have shouldRepeat() = true), the old implementation would wrap these in a generic UserCodeExecutionException (which has shouldRepeat() = false), causing the Repeater to not retry the operation as intended.

Solution

  • Scan the full causal chain using Guava's Throwables.getCausalChain()
  • Preserve all specific retryable exception types (Quota/Timeout/RemoteSystem)
  • Prefer specific types over generic UserCodeExecutionException when both exist in the chain to prevent masking of retryable exceptions
  • Handle circular causal chains gracefully by catching IllegalArgumentException

Changes

Modified Files

sdks/java/io/rrio/src/main/java/org/apache/beam/io/requestresponse/Call.java (+31/-7 lines)

  • Added Throwables import for causal chain traversal
  • Rewrote parseAndThrow method to scan full exception chain
  • Added logic to prefer specific exception types over generic ones
  • Added circular reference handling

sdks/java/io/rrio/src/test/java/org/apache/beam/io/requestresponse/CallTest.java (+264/-2 lines)

  • Added 10 new unit tests covering various exception scenarios

Testing

Added comprehensive test coverage for:

  • ✅ Direct retryable exceptions (Timeout, RemoteSystem, Quota)
  • ✅ Nested exceptions wrapped in UncheckedExecutionException
  • ✅ Generic UserCodeExecutionException masking specific types (3 scenarios)
  • ✅ Triple-nested exceptions
  • ✅ Circular reference in causal chain
  • ✅ Non-UserCode exceptions (RuntimeException)

Test Results:

  • CallTest: All tests passing
  • Full rrio test suite: 90 tasks passing ✅
  • Code formatting: spotlessCheck passing ✅

Impact

Behavior Change: Code that previously saw a generic UserCodeExecutionException may now see the specific subtype (UserCodeTimeoutException/UserCodeRemoteSystemException). This is the intended fix to restore proper retry behavior.

Performance: Minimal impact - exception chain traversal only occurs on error paths.

Backwards Compatibility: The change improves correctness. Any code that relied on exceptions being wrapped was working around a bug.

Example

Before:

// User code throws UserCodeTimeoutException
throw new UserCodeTimeoutException("timeout");

// parseAndThrow wraps it
throw new UserCodeExecutionException(cause); // shouldRepeat() = false ❌

// Repeater sees generic exception and doesn't retry

After:

// User code throws UserCodeTimeoutException
throw new UserCodeTimeoutException("timeout");

// parseAndThrow preserves it
throw (UserCodeTimeoutException) throwable; // shouldRepeat() = true ✅

// Repeater sees timeout exception and retries

Checklist

PDGGK and others added 3 commits January 14, 2026 11:23
Improved error messages when user code fails to serialize (pickle)
for distributed execution. The original error was too technical and
didn't explain the cause or suggest fixes.

Changes:
- Enhanced RuntimeError message with clear explanation of why
  serialization is required
- Added common causes (lambdas capturing file handles, DB connections,
  thread locks)
- Provided three concrete fixes: module-level functions, setup()
  methods, checking closure captures
- Broadened exception catching to include TypeError and other
  pickling failures (not just RuntimeError)
- Added exception chaining (from e) to preserve original stack trace
- Added test case to verify the new error message content

This significantly improves developer experience when debugging
serialization issues, especially for new Apache Beam users.

Fixes apache#37209

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Fix Python formatter precommit check by applying yapf v0.43.0
formatting rules to modified files.
…ypes

Problem:
The parseAndThrow method in Call.java was wrapping retryable exceptions
(UserCodeTimeoutException, UserCodeRemoteSystemException) in a generic
UserCodeExecutionException, which breaks the retry logic that depends on
exception.shouldRepeat() returning true.

Solution:
- Scan the full causal chain using Guava's Throwables.getCausalChain()
- Preserve all specific retryable exception types (Quota/Timeout/RemoteSystem)
- Prefer specific types over generic UserCodeExecutionException when both exist
  in the chain to prevent masking of retryable exceptions
- Handle circular causal chains gracefully by catching IllegalArgumentException

Testing:
- Added 10 new unit tests covering:
  * Direct retryable exceptions (Timeout, RemoteSystem)
  * Nested exceptions (UncheckedExecutionException wrapping)
  * Generic UserCodeExecutionException masking specific types
  * Triple-nested exceptions
  * Circular reference in causal chain
  * Non-UserCode exceptions (RuntimeException)
- All existing tests pass
- Full rrio test suite passes (90 tasks)

Fixes apache#37176

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @PDGGK, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the RequestResponseIO component where retryable exceptions were being masked by a generic exception type, preventing the intended retry mechanism from functioning correctly. The changes ensure that the original, specific retryable exception types are preserved and propagated, thereby restoring proper retry behavior and improving the reliability of user code execution within the RRIO framework. Additionally, the Python SDK's error messages for non-serializable user code have been enhanced to provide clearer guidance for developers.

Highlights

  • Exception Preservation in RequestResponseIO: The parseAndThrow method in Call.java has been updated to correctly preserve retryable exception types (e.g., UserCodeTimeoutException, UserCodeRemoteSystemException, UserCodeQuotaException) by scanning the full causal chain. Previously, these were incorrectly wrapped in a generic UserCodeExecutionException, which prevented proper retry logic.
  • Causal Chain Traversal and Prioritization: The updated logic uses Guava's Throwables.getCausalChain() to inspect all nested exceptions. It prioritizes re-throwing specific retryable exceptions over generic UserCodeExecutionException when both are present in the chain, ensuring that the most specific and actionable exception is exposed.
  • Circular Reference Handling: The parseAndThrow method now gracefully handles circular references within an exception's causal chain by catching IllegalArgumentException and wrapping it in a UserCodeExecutionException with an informative message.
  • Comprehensive Test Coverage: Ten new unit tests have been added to CallTest.java to cover various exception scenarios, including direct retryable exceptions, nested exceptions, generic wrapping, and circular references, ensuring the robustness of the fix.
  • Improved Python Pickling Error Messages: The Python SDK's ptransform.py now provides a more detailed and helpful error message when user code fails to pickle. The message includes common causes for non-serializable objects and suggestions for resolution, such as using module-level functions or initializing resources in setup() methods.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @jrmccluskey for label python.
R: @Abacn for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@PDGGK
Copy link
Author

PDGGK commented Jan 19, 2026

Hi! I notice that 2 Python PreCommit checks failed (Python 3.10 and 3.11), but Python 3.12 and 3.13 passed successfully.

Since this PR only modifies Java files (Call.java and CallTest.java) in sdks/java/io/rrio/, and all Java checks (including beam_PreCommit_Java_RequestResponse_IO_Direct) passed, these Python failures appear to be unrelated to the changes.

The failure pattern (only 3.10/3.11, not 3.12/3.13) suggests this might be a flaky test or infrastructure issue specific to those Python versions.

Could someone please re-run the failed Python PreCommit checks? Thank you!

Failed checks:

  • beam_PreCommit_Python (Run Python PreCommit 3.10)
  • beam_PreCommit_Python (Run Python PreCommit 3.11)

@PDGGK
Copy link
Author

PDGGK commented Jan 19, 2026

Closing this PR as it was created from the wrong branch (fix-issue-37176-exception-wrapping) which contained unrelated Python changes from PR #37298.

Created a new PR #37342 from a clean branch (java-fix-requestresponseio) that contains only the Java changes for this fix. This should resolve the Python CI failures that were appearing here.

@PDGGK PDGGK closed this Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: RequestResponseIO: Call wraps retryable exceptions in UserCodeExecutionException, preventing retry/backoff

1 participant