-
Notifications
You must be signed in to change notification settings - Fork 4.5k
increase timeout via posargs #37989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tvalentyn
merged 4 commits into
apache:master
from
aIbrahiim:fix-34739-precommit-python-job
Apr 3, 2026
+5
−0
Merged
increase timeout via posargs #37989
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
0f85fec
raise pytest/grpc timeout
aIbrahiim 70df463
revert timeout bump
aIbrahiim f1a2b73
Stabilize Python PreCommit with gRPC fork env and YAML CI venv clone fix
aIbrahiim 6f5cc71
Set GRPC_ENABLE_FORK_SUPPORT=0 in Python PreCommit with grpc#37710 TODO
aIbrahiim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there is some racing condition for tests run in parllell cc: @tvalentyn. Nevertheless we can add these environment variables at the moment.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can repro this stuckness locally, or ssh into GHA worker that is stuck, then we could spy on the python process with tools like
pystack, and then examine stracktraces.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could give some clues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should I run with SSH access on a stuck self-hosted runner and collect pystack traces there, or add temporary CI instrumentation (faulthandler + periodic pystack dumps as artifacts) so the next stuck run captures stack traces automatically? @tvalentyn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an option? is it easy?
how often does this issue reproduce?
overall, i'd pick whatever option is more straightforward.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good findings!
re: 1 -- is there something specific to YAML suite? do you have some suggestions (e.g. skip copying tmp?) I
re: 2 -- Could you try setting this environment variable and seeing if it helps:
beam/.github/workflows/beam_PreCommit_GoPortable.yml
Line 42 in ee02db8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: 1 --The issue ties to YAML external providers: they clone a venv in yaml_provider via clonevirtualenv so source is either a fresh mini venv, or in .dev the whole tox venv (path from base_python) and in CI that tree is live, so paths like tmp/ can disappear mid-copy, fits what we saw so I suggesst ignore tmp/, pycache/, .pytest_cache/, pip/build temps (or replace clonevirtualenv with a copy that supports ignore / only needed dirs) and for .dev/CI, prefer _create_venv_from_scratch (or a template venv) instead of cloning the full tox env, slower, stabler. Optional unit test with a fake venv + volatile tmp/ to guard regressions.
re: 2 -- subprocess_server waits on gRPC channel ready for expansion_service_main until pytest-timeout. faulthandler showed xdist/execnet sometimes, xdist off for one Python version didn’t fix other legs, parallelism isnt the whole story so I suggesst on child exit or long wait, surface exit code + stdout/stderr (and maybe fail faster with a clear error).
For ML YAML, preinstall/pin heavy deps in tox so slow pip/import/model startup doesn’t look like a hang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the same env var to Python PreCommit and see if it changes the expansion / xdist behavior, will update you back whether it helps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvalentyn I added GRPC_ENABLE_FORK_SUPPORT: "0" in Python PreCommit and kept the YAML side stabilization in yaml_provider then I reran the workflow and it passed https://github.com/apache/beam/actions/runs/23892113364
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @sergiitk FYI - we might be seeing stuckness issues in other, pure python test suites, where there are processes and subprocesses communicating over GRPC.