Skip to content

Commit 25bc0aa

Browse files
authored
Merge pull request #240 from git-for-windows/msys2-runtime-cb9b2d04699ca0a5d1f32f6e79447b0823444fb4
msys2-runtime: update to 3.6.3-3
2 parents 6f5b35c + 362dddc commit 25bc0aa

File tree

3 files changed

+215
-6
lines changed

3 files changed

+215
-6
lines changed
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
From cb9b2d04699ca0a5d1f32f6e79447b0823444fb4 Mon Sep 17 00:00:00 2001
2+
From: Johannes Schindelin <[email protected]>
3+
Date: Mon, 23 Jun 2025 13:06:02 +0200
4+
Subject: [PATCH 50/N] pipe: fix SSH hang (again)
5+
MIME-Version: 1.0
6+
Content-Type: text/plain; charset=UTF-8
7+
Content-Transfer-Encoding: 8bit
8+
9+
When cbfaeba4f7 (Cygwin: pipe: Fix incorrect write length in
10+
raw_write(), 2024-11-06) fixed a bug where too-long writes could cause
11+
segmentation faults, it not only left out crucial details from the
12+
commit message, it also introduced a bug.
13+
14+
This manifests e.g. in the symptom where cloning large repositories in
15+
Git for Windows via SSH "freeze" at random stages, as has been reported
16+
in https://github.com/git-for-windows/git/issues/5688 and in
17+
https://github.com/git-for-windows/git/issues/5682.
18+
19+
The bug in question was to use `is_nonblocking ()` instead of
20+
`real_non_blocking_mode` in a newly introduced condition. If the commit
21+
message had filled in those details, it would most likely have been much
22+
easier to spot the bug before the patch was committed.
23+
24+
Granted, the patch _sort of_ moved this `is_nonblocking ()` condition
25+
from another place, which means that the bug was present before, even if
26+
it did not have the same detrimental symptom of hanging a clone of a
27+
large repository via SSH.
28+
29+
The _original_ bug was introduced in the equally under-explained
30+
7ed9adb356 (Cygwin: pipe: Switch pipe mode to blocking mode by default,
31+
2024-09-05).
32+
33+
What is ironic is that this here patch is the latest (and hopefully
34+
last) commit in a _long_ chain of bug fixes that fix bugs introduced by
35+
preceding bug fixes:
36+
37+
- 9e4d308cd5 (Cygwin: pipe: Adopt FILE_SYNCHRONOUS_IO_NONALERT flag for
38+
read pipe., 2021-11-10) fixed a bug where Cygwin hung by mistake while
39+
piping output from one .NET program as input to another .NET program
40+
(potentially introduced by 365199090c (Cygwin: pipe: Avoid false EOF
41+
while reading output of C# programs., 2021-11-07), which was itself a
42+
bug fix). It introduced a bug that was fixed by...
43+
- fc691d0246 (Cygwin: pipe: Make sure to set read pipe non-blocking for
44+
cygwin apps., 2024-03-11). Which introduced a bug that was purportedly
45+
fixed by...
46+
- 7ed9adb356 (Cygwin: pipe: Switch pipe mode to blocking mode by
47+
default, 2024-09-05). Which introduced a bug that was fixed by...
48+
- cbfaeba4f7 (Cygwin: pipe: Fix incorrect write length in raw_write(),
49+
2024-11-06). Which introduced a bug that was fixed by... this here
50+
patch.
51+
52+
There is not only the common thread here that each of these bug fixes
53+
introduced a new bug, but also the common thread of commit messages that
54+
leave the reader puzzled and the problem — as well as the solution —
55+
under-explained. It is quite likely that, had the time and care been
56+
spent to fully explain what is going on in the commit messages, this
57+
would quite likely have triggered a re-review of the diff by the author.
58+
In other words, writing a complete commit message could have increased
59+
the insight and understanding of the problem and thereby prevented this
60+
series of "fixes that introduce new bugs".
61+
62+
To avoid making the same mistake once again, here is my attempt at a
63+
thorough explanation of what's going on, what is the problem, what
64+
solutions were considered, and why the "winner" was chosen.
65+
66+
So let's start with an overview of the logic of the
67+
`fhandler_pipe_fifo::raw_write ()` method.
68+
69+
This method implements the fundamental logic behind writing to any pipe
70+
in Cygwin, whether it be blocking or not blocking, and takes as input
71+
two variables, `ptr` and `len`, pointing to the data to be written.
72+
73+
First, this method determines a couple of initial conditions (such as
74+
maybe waiting on a specific mutex, or determining whether the pipe had
75+
been closed already, or whether it is in blocking or in non-blocking
76+
mode). Then, this method determines how many bytes of data should be
77+
written in each attempt (assigned to the variable `chunk`).
78+
79+
The amount of bytes already written is tracked as `nbytes`. While this
80+
value is smaller than `len`, the central loop of this method is
81+
executed.
82+
83+
Counterintuitively, this loop is necessary even in non-blocking mode, to
84+
handle conditions like `STATUS_PENDING`.
85+
86+
Inside this loop body, a variety of conditions are handled, including
87+
some that break out of that loop before `nbytes >= len`, e.g. when the
88+
pipe is closed on the other end.
89+
90+
This loop body also contains special, complicated logic to handle the
91+
case where the pipe buffer is not large enough to handle the current
92+
`chunk`, trying to write even less bytes (this is guarded by the flag
93+
`short_write_once`).
94+
95+
Unfortunately, the recent re-design of that logic decided to always set
96+
the pipe to blocking (cf. the diff of ed9adb356 (Cygwin: pipe: Switch
97+
pipe mode to blocking mode by default, 2024-09-05)). The rationale for
98+
that decision was the desire to simplify the code, which came at the
99+
expense of working well together with pipes that were created outside of
100+
Cygwin and that are set to non-blocking mode.
101+
102+
To this end, the current implementation now _emulates_ non-blocking
103+
behavior while keeping the actual pipe in blocking mode. This requires
104+
knowing how much data can be written to the pipe without blocking at any
105+
given moment. Because by knowing that amount, even in blocking mode a
106+
write operation can be performed that does not block. This information
107+
is obtained via `NtQueryInformationFile(FilePipeLocalInformation)` and
108+
stored in the variable `avail`.
109+
110+
However, if for some reason it cannot be determined how much space is
111+
available in the pipe, this non-blocking mode emulation would not work.
112+
Therefore the pipe is set to genuine non-blocking mode in such cases,
113+
and only in such cases. This minimizes the time window (but does not
114+
eliminate it completely!) during which Cygwin does not work well
115+
together with non-Cygwin processes via pipes.
116+
117+
It also shows that the original desire to simplify the code can not be
118+
fulfilled using the current strategy, as there is now both code to
119+
emulate non-blocking mode as well as code that handles genuinely
120+
non-blocking mode.
121+
122+
This information whether non-blocking mode is merely emulated or
123+
genuinely enabled, is tracked by the Boolean variable
124+
`real_non_blocking_mode`.
125+
126+
At the beginning of the loop body, after determining how many bytes to
127+
write (stored in `len1`), the `NtWriteFile()` is called. Since the
128+
re-design to always use blocking mode (at least insofar possible), this
129+
`NtWriteFile()` call is guarded by a condition so that it is called only
130+
once in non-blocking mode.
131+
132+
Now, the logic for that is quite complex. First, `len1` is clamped to
133+
`chunk` in blocking mode, otherwise to however many bytes are left to
134+
write. Then, an _inner_ `while` loop runs as long as `len1` is greater
135+
than 0. Inside that loop, `NtWriteFile()` is called _unless_ we'd try to
136+
write more bytes than are available in emulated non-blocking mode.
137+
138+
It is the complex interplay between that clamping to `chunk` and the
139+
condition when `NtWriteFile()` is skipped that is at the heart of this
140+
bug.
141+
142+
But let's first see what happens in the case described above, where the
143+
(Cygwin) SSH process gets stuck writing to the (non-Cygwin) Git process.
144+
In the instances I observed, the big `while (nbytes < len)` loop is
145+
started with the following initial conditions: `len` is 2097152, `chunk`
146+
and `avail` are 1 and `real_non_blocking_mode` is 0.
147+
148+
When the outer loop is entered, `len1` is therefore also set to 2097152,
149+
and at the start of the inner loop the condition `len1 > avail` holds
150+
true. Therefore, the `NtWriteFile()` call is skipped already during the
151+
first iteration, we run into the code that tries a short write (setting
152+
`short_write_once` to 1), and ultimately fail and return from
153+
`raw_write()` with an error.
154+
155+
Since not a single byte was written, the caller will try again at some
156+
stage, failing again, over and over, and we're in an infinite loop.
157+
158+
The fix chosen in this here commit is to apply that clamping to `chunk`
159+
_also_ in the emulated non-blocking mode. This lets the `NtWriteFile()`
160+
call _not_ be skipped in non-emulated non-blocking mode, as was clearly
161+
the intention.
162+
163+
An alternative patch that would have also addressed the symptom would be
164+
to partially revert this hunk of cbfaeba4f7 (Cygwin: pipe: Fix incorrect
165+
write length in raw_write(), 2024-11-06):
166+
167+
chunk = len;
168+
- else if (is_nonblocking ())
169+
- chunk = len = pipe_buf_size;
170+
else
171+
chunk = avail;
172+
173+
The logic changed such that `len` would no longer be clamped to the same
174+
value as `chunk` in non-blocking mode. Reinstating that clamp in
175+
non-blocking mode would also work around the issue, but it would be a
176+
bit more heavy-handed than the chosen fix.
177+
178+
Yet another potential fix that was suggested (in
179+
https://github.com/git-for-windows/git/issues/5688#issuecomment-2995952882)
180+
would be to modify the condition when `NtWriteFile()` is skipped so as
181+
to force it to _not_ be skipped when attempting a "short write", i.e.
182+
when `short_write_once != 0`. This fix would have worked around the bug
183+
successfully, but is fixing the bug at the wrong layer (even if it hints
184+
at the fact that the `short_write_once` logic is ill-prepared for
185+
non-blocking mode, but that's a topic for another patch).
186+
187+
Fixes: cbfaeba4f7 (Cygwin: pipe: Fix incorrect write length in raw_write(), 2024-11-06)
188+
Co-authored-by: Takashi Yano <[email protected]>
189+
Signed-off-by: Johannes Schindelin <[email protected]>
190+
---
191+
winsup/cygwin/fhandler/pipe.cc | 2 +-
192+
1 file changed, 1 insertion(+), 1 deletion(-)
193+
194+
diff --git a/winsup/cygwin/fhandler/pipe.cc b/winsup/cygwin/fhandler/pipe.cc
195+
index 506dd09..169c0bb 100644
196+
--- a/winsup/cygwin/fhandler/pipe.cc
197+
+++ b/winsup/cygwin/fhandler/pipe.cc
198+
@@ -561,7 +561,7 @@ fhandler_pipe_fifo::raw_write (const void *ptr, size_t len)
199+
ULONG len1;
200+
DWORD waitret = WAIT_OBJECT_0;
201+
202+
- if (left > chunk && !is_nonblocking ())
203+
+ if (left > chunk && !real_non_blocking_mode)
204+
len1 = chunk;
205+
else
206+
len1 = (ULONG) left;

msys2-runtime/PKGBUILD

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
pkgbase=msys2-runtime
55
pkgname=('msys2-runtime' 'msys2-runtime-devel')
66
pkgver=3.6.3
7-
pkgrel=2
7+
pkgrel=3
88
pkgdesc="Cygwin POSIX emulation engine"
99
arch=('x86_64')
1010
url="https://www.cygwin.com/"
@@ -75,9 +75,10 @@ source=('msys2-runtime'::git+https://github.com/cygwin/cygwin#tag=cygwin-${pkgve
7575
0046-Change-the-default-base-address-for-x86_64.patch
7676
0047-msys2-runtime-restore-fast-path-for-current-user-pri.patch
7777
0048-Cygwin-Fix-compatibility-with-w32api-headers-v13.patch
78-
0049-symlink_native-allow-linking-to.patch)
78+
0049-symlink_native-allow-linking-to.patch
79+
0050-pipe-fix-SSH-hang-again.patch)
7980
sha256sums=('53d2be8a2dcf58e7eae7823ef01a5b8ce0eba704b132d19167396eb162db378c'
80-
'24a6defda6fd286c96e18f8ee8c7f36fd3c010f2df8e3b619e1ebb61bb61b1df'
81+
'f61169068c110e6b5dfc5deed7fc2572a890c59ea3b4da7a0c6e171530a8f7b5'
8182
'5538db757661949423563bab5e4f383dfa4ef0cf3301cf43f23b482392554b3b'
8283
'2fb9b2b297797d20cd901eaee2de735e8cdda1c1e5836e9ff77856c0d1216860'
8384
'780977d71e35bbe4c0dcda5272895267d68d635f311e224f7981858f7192a85e'
@@ -126,7 +127,8 @@ sha256sums=('53d2be8a2dcf58e7eae7823ef01a5b8ce0eba704b132d19167396eb162db378c'
126127
'8fbb38dc064d11599b2314311abfa632ec5a8da4786c716299e276abd29d7326'
127128
'6db2e2a7e45125974de355fd685d3a652c1f17e5bd2a94606cbba33520d96624'
128129
'37f4dcaab659da6f7730a7fdd855c91e51b642bd42820a7f35af514a866db7fd'
129-
'e2b046866637090bd9123fd2fcd00faac5d903e4b322959b9c13ea1161e786a2')
130+
'e2b046866637090bd9123fd2fcd00faac5d903e4b322959b9c13ea1161e786a2'
131+
'eeca52961120ff4d258045913faf9e482a44a4e058336b8206ad76002bbf0a65')
130132

131133
# Helper macros to help make tasks easier #
132134
apply_patch_with_msg() {
@@ -229,7 +231,8 @@ prepare() {
229231
0046-Change-the-default-base-address-for-x86_64.patch \
230232
0047-msys2-runtime-restore-fast-path-for-current-user-pri.patch \
231233
0048-Cygwin-Fix-compatibility-with-w32api-headers-v13.patch \
232-
0049-symlink_native-allow-linking-to.patch
234+
0049-symlink_native-allow-linking-to.patch \
235+
0050-pipe-fix-SSH-hang-again.patch
233236
}
234237

235238
build() {

msys2-runtime/msys2-runtime.commit

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
d973094841109fda63a3e79742dcd06584227e06
1+
cb9b2d04699ca0a5d1f32f6e79447b0823444fb4

0 commit comments

Comments
 (0)