-
Notifications
You must be signed in to change notification settings - Fork 156
t5410: avoid hangs in CI runs in the win+Meson test jobs #1932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the GitHub workflow used in Git's CI builds, the `vs test` jobs use a subset of a specific revision of Git for Windows' SDK to run Git's test suite. This revision is validated by another CI workflow to ensure that said revision _can_ run Git's test suite successfully, skipping buggy updates in Git for Windows' SDK. The `win+Meson test` jobs do things differently, quite differently. They use the Bash of the Git for Windows version that is installed on the runners to run Git's test suite. This difference has consequences. When 68cb0b5 (builtin/receive-pack: add option to skip connectivity check, 2025-05-20) introduced a test case that uses `tee <file> | git receive-pack` as `--receive-pack` parameter (imitating an existing pattern in the same test script), it hit just the sweet spot to trigger a bug in the MSYS2 runtime shipped in Git for Windows v2.49.0. This version is the one currently installed on GitHub's runners. The problem is that the `git receive-pack` process finishes while the `tee` process does not need to write anything anymore and therefore does not receive an EOF. Instead, it should receive a SIGPIPE, but the bug in the MSYS2 runtime prevents that from working as intended. As a consequence, the `tee` process waits for more input from the `git.exe send-pack` process but none is coming, and the test script patiently waits until the 6h timeout hits. Only every once in a while, the `git receive-pack` process manages to send an EOF to the `tee` process and no hang occurs. Therefore, the problem can be worked around by cancelling the clearly-hanging job after twenty or so minutes and re-running it, repeating the process about half a dozen times, until the hang was successfully avoided. This bug in the MSYS2 runtime has been fixed in the meantime, which is the reason why the same test case causes no problems in the `win test` and the `vs test` jobs. This will continue to be the case until the Git for Windows version on the GitHub runners is upgraded to a version that distributes a newer MSYS2 runtime version. However, as of time of writing, this _is_ the latest Git for Windows version, and will be for another 1.5 weeks, until Git v2.50.0 is scheduled to appear (and shortly thereafter Git for Windows v2.50.0). Traditionally it takes a while before the runners pick up the new version. We could just wait it out, six hours at a time. Here, I opt for an alternative: Detect the buggy MSYS2 runtime and simply skip the test case. It's not like the `receive-pack` test cases are specific to Windows, and even then, to my chagrin the CI runs in git-for-windows/git spend around ten hours of compute time each and every time to run the entire test suite on all the platforms, even the tests that cover cross-platform code, and for Windows alone we do that three times: with GCC, with MSVC, and with MSVC via Meson. Therefore, I deem it more than acceptable to skip this test case in one of those matrices. For good luck, also the preceding test case is skipped in that scenario, as it uses the same `--receive-pack=tee <file> | git receive-pack` pattern, even though I never observed that test case to hang in practice. Signed-off-by: Johannes Schindelin <[email protected]>
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Patrick Steinhardt wrote (reply to this): On Thu, Jun 05, 2025 at 10:16:45AM +0000, Johannes Schindelin via GitGitGadget wrote:
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> index f76a22943ef..09d6bfd2a10 100755
> --- a/t/t5410-receive-pack.sh
> +++ b/t/t5410-receive-pack.sh
> @@ -41,7 +41,19 @@ test_expect_success 'with core.alternateRefsPrefixes' '
> test_cmp expect actual.haves
> '
>
> -test_expect_success 'receive-pack missing objects fails connectivity check' '
> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> + test_have_prereq !MINGW &&
> + case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'
> +
> +test_expect_success TEE_DOES_NOT_HANG \
> + 'receive-pack missing objects fails connectivity check' '
> test_when_finished rm -rf repo remote.git setup.git &&
>
> git init repo &&
Quite interesting. I any case, I think this is a sensible fix for now.
It's a known bug, we know it's fixed, we just have to wait. And the fact
that this prereq will basically auto-disarm itself once we have the new
version is nice.
I did wonder whether we can maybe rewrite the test so that we compute
our own packfile instead of intercepting the one from git-send-pack(1).
But I'm not sure whether that's really worth it.
Thanks!
Patrick |
On the Git mailing list, Junio C Hamano wrote (reply to this): "Johannes Schindelin via GitGitGadget" <[email protected]>
writes:
> ...
> This bug in the MSYS2 runtime has been fixed in the meantime, which is
> the reason why the same test case causes no problems in the `win test`
> and the `vs test` jobs.
>
> This will continue to be the case until the Git for Windows version on
> the GitHub runners is upgraded to a version that distributes a newer
> MSYS2 runtime version. However, as of time of writing, this _is_ the
> latest Git for Windows version, and will be for another 1.5 weeks, until
> Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
> Windows v2.50.0). Traditionally it takes a while before the runners pick
> up the new version.
> ...
> I finally had a chance to look more closely at this problem. Here is my
> alternative to what Patrick proposed in
> https://lore.kernel.org/git/[email protected]/.
Superb. It must have taken a truly heroic effort.
Thanks and congratulations for finally solving the puzzle.
I do agree that Patrick's "wrap the same in a script" smelled like
shifting a timing issue and not truly a solution.
> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> + test_have_prereq !MINGW &&
> + case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'
That's very specific ;-).
As this is not in a library-ish part, it does not have to be lazy.
Anybody running this test script need to tell if their environment
satisfies the prerequisite, but lazy one does have a documentation
value, I guess, and a bit of extra indirection does not hurt.
Will queue. Thanks again. |
This patch series was integrated into seen via git@14de3eb. |
This patch series was integrated into master via git@14de3eb. |
This patch series was integrated into next via git@fc6ec28. |
Closed via 14de3eb. |
I finally had a chance to look more closely at this problem. Here is my alternative to what Patrick proposed in https://lore.kernel.org/git/[email protected]/.
Cc: Patrick Steinhardt [email protected]