Conversation
|
Running the timing out tests locally, I got: This isn't showing up in CI because stdout is only logged when a suite fails, but we're timing out before that can happen. |
|
And iin a different test (which runs first) I see the following (looks like same root cause): |
|
Run Yaml_Xlang_Direct PreCommit |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
R: @shunping |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
|
||
| IdGenerator idGenerator = IdGenerators.decrementingLongs(); | ||
| ShortIdMap metricsShortIds = new ShortIdMap(); | ||
| ExecutorService executorService = |
There was a problem hiding this comment.
Curious why this is needed now. Does this suggest current ScheduledExecutorServiceFactory not working as expected?
There was a problem hiding this comment.
Thanks for calling this out - was in the PR description originally, and then I lost it when I updated to include information for folks coming from CHANGES. From the original description:
This fixes a thread safety issue which was causing tests to time out and fail.
The race condition comes from spinning up Java environments - when we do this today (before this PR), we:
- Get a singleton ExecutorService object -
- Execute a pipeline
- Shutdown the ExecutorService -
When this is all handled in different processes (as the current FnApiRunner implementation does), its not a problem since the ExecutorService doesn't need to outlive the pipeline. In multi-pipeline processes, however, the same ExecutorService object gets reused and teardown can get called while it is executing or before it starts. This leads to a long execution and an eventual timeout (see comments in PR below for details on error thrown).
This PR fixes the issue by creating a new ExecutorService object per worker.
There was a problem hiding this comment.
I see #35621 (comment), could be due to main() get called multiple times, and the Executor embedded in pipeline option already shutdown
There was a problem hiding this comment.
Oh racing condition comment. This attracted attention at first glance because there was a thread leak bug due to Executor created every time spin up a runner - #32272 . As long as shutdown is called when Executor no longer used I think it's fine. Thanks again for the details!
* Reenable prism as default * Fix race condition * Spotless * Portable test * Fix a few new test issues * Update CHANGES
Changes the default Python runner to be prism (when prism is supported).
There are a bunch of use cases automatically excluded, but this also required updating a bunch of tests. The 2 main breaking changes this introduces are:
Instead of raising Python-specific exceptions, when Prism jobs fail a RuntimeError is always raised. To fix any tests relying on the old error, change your exception catching logic to look for broader errors and use regexes to verify the contents of the error. There are lots of examples of this in this PR.
Prism names its metrics differently than the local runner. For this PR in Beam, I just kept all of those on the old FnApiRunner for now, but you could also just change the metric name you're looking for (this will be the eventual fix for these tests in this repo).
If this change breaks you in unexpected ways, please:
Part of #34549
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.