Skip to content

Conversation

dscho
Copy link
Member

@dscho dscho commented Jun 5, 2025

I finally had a chance to look more closely at this problem. Here is my alternative to what Patrick proposed in https://lore.kernel.org/git/[email protected]/.

Cc: Patrick Steinhardt [email protected]

In the GitHub workflow used in Git's CI builds, the `vs test` jobs use a
subset of a specific revision of Git for Windows' SDK to run Git's test
suite. This revision is validated by another CI workflow to ensure that
said revision _can_ run Git's test suite successfully, skipping buggy
updates in Git for Windows' SDK.

The `win+Meson test` jobs do things differently, quite differently. They
use the Bash of the Git for Windows version that is installed on the
runners to run Git's test suite.

This difference has consequences.

When 68cb0b5 (builtin/receive-pack: add option to skip connectivity
check, 2025-05-20) introduced a test case that uses `tee <file> | git
receive-pack` as `--receive-pack` parameter (imitating an existing
pattern in the same test script), it hit just the sweet spot to trigger
a bug in the MSYS2 runtime shipped in Git for Windows v2.49.0. This
version is the one currently installed on GitHub's runners.

The problem is that the `git receive-pack` process finishes while the
`tee` process does not need to write anything anymore and therefore does
not receive an EOF. Instead, it should receive a SIGPIPE, but the bug in
the MSYS2 runtime prevents that from working as intended. As a
consequence, the `tee` process waits for more input from the `git.exe
send-pack` process but none is coming, and the test script patiently
waits until the 6h timeout hits.

Only every once in a while, the `git receive-pack` process manages to
send an EOF to the `tee` process and no hang occurs. Therefore, the
problem can be worked around by cancelling the clearly-hanging job after
twenty or so minutes and re-running it, repeating the process about half
a dozen times, until the hang was successfully avoided.

This bug in the MSYS2 runtime has been fixed in the meantime, which is
the reason why the same test case causes no problems in the `win test`
and the `vs test` jobs.

This will continue to be the case until the Git for Windows version on
the GitHub runners is upgraded to a version that distributes a newer
MSYS2 runtime version. However, as of time of writing, this _is_ the
latest Git for Windows version, and will be for another 1.5 weeks, until
Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
Windows v2.50.0). Traditionally it takes a while before the runners pick
up the new version.

We could just wait it out, six hours at a time.

Here, I opt for an alternative: Detect the buggy MSYS2 runtime and
simply skip the test case. It's not like the `receive-pack` test cases
are specific to Windows, and even then, to my chagrin the CI runs in
git-for-windows/git spend around ten hours of compute time each and
every time to run the entire test suite on all the platforms, even the
tests that cover cross-platform code, and for Windows alone we do that
three times: with GCC, with MSVC, and with MSVC via Meson. Therefore, I
deem it more than acceptable to skip this test case in one of those
matrices.

For good luck, also the preceding test case is skipped in that scenario,
as it uses the same `--receive-pack=tee <file> | git receive-pack`
pattern, even though I never observed that test case to hang in
practice.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho self-assigned this Jun 5, 2025
@dscho
Copy link
Member Author

dscho commented Jun 5, 2025

/submit

Copy link

gitgitgadget bot commented Jun 5, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1932/dscho/t5410-hangs-in-win+Meson-v1

To fetch this version to local tag pr-1932/dscho/t5410-hangs-in-win+Meson-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1932/dscho/t5410-hangs-in-win+Meson-v1

Copy link

gitgitgadget bot commented Jun 5, 2025

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, Jun 05, 2025 at 10:16:45AM +0000, Johannes Schindelin via GitGitGadget wrote:
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> index f76a22943ef..09d6bfd2a10 100755
> --- a/t/t5410-receive-pack.sh
> +++ b/t/t5410-receive-pack.sh
> @@ -41,7 +41,19 @@ test_expect_success 'with core.alternateRefsPrefixes' '
>  	test_cmp expect actual.haves
>  '
>  
> -test_expect_success 'receive-pack missing objects fails connectivity check' '
> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> +	test_have_prereq !MINGW &&
> +	case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'
> +
> +test_expect_success TEE_DOES_NOT_HANG \
> +	'receive-pack missing objects fails connectivity check' '
>  	test_when_finished rm -rf repo remote.git setup.git &&
>  
>  	git init repo &&

Quite interesting. I any case, I think this is a sensible fix for now.
It's a known bug, we know it's fixed, we just have to wait. And the fact
that this prereq will basically auto-disarm itself once we have the new
version is nice.

I did wonder whether we can maybe rewrite the test so that we compute
our own packfile instead of intercepting the one from git-send-pack(1).
But I'm not sure whether that's really worth it.

Thanks!

Patrick

Copy link

gitgitgadget bot commented Jun 5, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> ...
> This bug in the MSYS2 runtime has been fixed in the meantime, which is
> the reason why the same test case causes no problems in the `win test`
> and the `vs test` jobs.
>
> This will continue to be the case until the Git for Windows version on
> the GitHub runners is upgraded to a version that distributes a newer
> MSYS2 runtime version. However, as of time of writing, this _is_ the
> latest Git for Windows version, and will be for another 1.5 weeks, until
> Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
> Windows v2.50.0). Traditionally it takes a while before the runners pick
> up the new version.
> ...
>     I finally had a chance to look more closely at this problem. Here is my
>     alternative to what Patrick proposed in
>     https://lore.kernel.org/git/[email protected]/.

Superb.  It must have taken a truly heroic effort.

Thanks and congratulations for finally solving the puzzle.

I do agree that Patrick's "wrap the same in a script" smelled like
shifting a timing issue and not truly a solution.

> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> +	test_have_prereq !MINGW &&
> +	case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'

That's very specific ;-).

As this is not in a library-ish part, it does not have to be lazy.
Anybody running this test script need to tell if their environment
satisfies the prerequisite, but lazy one does have a documentation
value, I guess, and a bit of extra indirection does not hurt.

Will queue.  Thanks again.

Copy link

gitgitgadget bot commented Jun 5, 2025

This patch series was integrated into seen via git@14de3eb.

Copy link

gitgitgadget bot commented Jun 5, 2025

This patch series was integrated into master via git@14de3eb.

Copy link

gitgitgadget bot commented Jun 5, 2025

This patch series was integrated into next via git@fc6ec28.

Copy link

gitgitgadget bot commented Jun 5, 2025

Closed via 14de3eb.

@dscho dscho deleted the t5410-hangs-in-win+Meson branch June 6, 2025 05:10
@dscho dscho restored the t5410-hangs-in-win+Meson branch June 6, 2025 12:10
@dscho dscho deleted the t5410-hangs-in-win+Meson branch June 11, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant