increase timeout via posargs by aIbrahiim · Pull Request #37989 · apache/beam

aIbrahiim · 2026-03-28T01:10:47Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

gemini-code-assist · 2026-03-28T01:10:53Z

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

github-actions · 2026-03-28T01:39:40Z

Assigning reviewers:

R: @Abacn for label build.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Abacn · 2026-03-31T14:15:16Z

.github/workflows/beam_PreCommit_Python.yml

          TOX_TESTENV_PASSENV: "DOCKER_*,TESTCONTAINERS_*,TC_*,BEAM_*,GRPC_*,OMP_*,OPENBLAS_*,PYTHONHASHSEED,PYTEST_*"
          # Aggressive retry and timeout settings for flaky CI
-          PYTEST_ADDOPTS: "-v --tb=short --maxfail=5 --durations=30 --reruns=5 --reruns-delay=15 --timeout=600 --disable-warnings"
+          PYTEST_ADDOPTS: "-v --tb=short --maxfail=5 --durations=30 --reruns=5 --reruns-delay=15 --timeout=900 --disable-warnings"


Not sure further increasing timeout and/or rerun could help. We already see the test suite stuck for >5h e.g. https://github.com/apache/beam/actions/runs/23779768061

Ahh I didn't see this run, then icreasing pytest timeoutcan mask hangs and extend stuck runs so i revert the timeout bump and keep this PR focused on grpc stability env vars only as a short term mitigation @Abacn

Abacn · 2026-03-31T14:16:21Z

.github/workflows/beam_PreCommit_Python.yml

          TC_MAX_TRIES: "15"
          TC_SLEEP_TIME: "5"
          # Additional gRPC stability for flaky environment
+          GRPC_ARG_KEEPALIVE_TIME_MS: "60000"


Wondering if there is some racing condition for tests run in parllell cc: @tvalentyn. Nevertheless we can add these environment variables at the moment.

if we can repro this stuckness locally, or ssh into GHA worker that is stuck, then we could spy on the python process with tools like pystack, and then examine stracktraces.

that could give some clues.

so should I run with SSH access on a stuck self-hosted runner and collect pystack traces there, or add temporary CI instrumentation (faulthandler + periodic pystack dumps as artifacts) so the next stuck run captures stack traces automatically? @tvalentyn

so should I run with SSH access on a stuck self-hosted runner

is it an option? is it easy?
how often does this issue reproduce?

overall, i'd pick whatever option is more straightforward.

from my side, SSH into a stuck self-hosted runner is possible probably only if we have runner access granted so it’s not always the easiest path to start with and locally I couldn’t reproduce yet due to WSL network limits (can’t reach plugins.gradle.org/pypi) so I dont have a solid repro rate from local runs so given that the most straightforward option is to add temporary CI instrumentation (faulthandler + periodic pystack dumps uploaded as artifacts) so every stuck run automatically gives us stack traces

sure we can try; it might be possible to also run that instrumentation on a separate fork if you can repro reliably on a fork and for whatever reason we don't want to merge the change (e.g. it makes tests much slower or unstable)

Abacn

Thanks!

github-actions bot added the build label Mar 28, 2026

github-actions bot added the Next Action: Reviewers label Mar 28, 2026

aIbrahiim force-pushed the fix-34739-precommit-python-job branch from 2bd7b78 to f611cf8 Compare March 31, 2026 13:15

raise pytest/grpc timeout

0f85fec

aIbrahiim force-pushed the fix-34739-precommit-python-job branch from f611cf8 to 0f85fec Compare March 31, 2026 13:16

Abacn reviewed Mar 31, 2026

View reviewed changes

revert timeout bump

70df463

Abacn approved these changes Apr 1, 2026

View reviewed changes

Conversation

aIbrahiim commented Mar 28, 2026

GitHub Actions Tests Status (on master branch)

Uh oh!

gemini-code-assist bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aIbrahiim Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Abacn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tvalentyn Mar 31, 2026 •

edited

Loading

aIbrahiim Mar 31, 2026 •

edited

Loading