Skip to content

Commit 9f4fcc7

Browse files
jeremyd2019lazka
authored andcommitted
cygthread: suspend thread before terminating.
This addresses an extremely difficult to debug deadlock when running under emulation on ARM64. A relatively easy way to trigger this bug is to call `fork()`, then within the child process immediately call another `fork()` and then `exit()` the intermediate process. It would seem that there is a "code emulation" lock on the wait thread at this stage, and if the thread is terminated too early, that lock still exists albeit without a thread, and nothing moves forward. It seems that a `SuspendThread()` combined with a `GetThreadContext()` (to force the thread to _actually_ be suspended, for more details see https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743) makes sure the thread is "booted" from emulation before it is suspended. Hopefully this means it won't be holding any locks or otherwise leave emulation in a bad state when the thread is terminated. Also, attempt to use `CancelSynchonousIo()` (as seen in `flock.cc`) to avoid the need for `TerminateThread()` altogether. This doesn't always work, however, so was not a complete fix for the deadlock issue. Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html Signed-off-by: Jeremy Drake <[email protected]>
1 parent b879775 commit 9f4fcc7

File tree

3 files changed

+31
-5
lines changed

3 files changed

+31
-5
lines changed

winsup/cygwin/cygthread.cc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,20 @@ cygthread::terminate_thread ()
302302
if (!inuse)
303303
goto force_notterminated;
304304

305+
if (_my_tls._ctinfo != this)
306+
{
307+
CONTEXT context;
308+
context.ContextFlags = CONTEXT_CONTROL;
309+
/* SuspendThread makes sure a thread is "booted" from emulation before
310+
it is suspended. As such, the emulator hopefully won't be in a bad
311+
state (aka, holding any locks) when the thread is terminated. */
312+
SuspendThread (h);
313+
/* We need to call GetThreadContext, even though we don't care about the
314+
context, because SuspendThread is asynchronous and GetThreadContext
315+
will make sure the thread is *really* suspended before returning */
316+
GetThreadContext (h, &context);
317+
}
318+
305319
TerminateThread (h, 0);
306320
WaitForSingleObject (h, INFINITE);
307321
CloseHandle (h);

winsup/cygwin/pinfo.cc

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1262,13 +1262,17 @@ proc_waiter (void *arg)
12621262

12631263
for (;;)
12641264
{
1265-
DWORD nb;
1265+
DWORD nb, err;
12661266
char buf = '\0';
12671267

12681268
if (!ReadFile (vchild.rd_proc_pipe, &buf, 1, &nb, NULL)
1269-
&& GetLastError () != ERROR_BROKEN_PIPE)
1269+
&& (err = GetLastError ()) != ERROR_BROKEN_PIPE)
12701270
{
1271-
system_printf ("error on read of child wait pipe %p, %E", vchild.rd_proc_pipe);
1271+
/* ERROR_OPERATION_ABORTED is expected due to the possibility that
1272+
CancelSynchronousIo interruped the ReadFile call, so don't output
1273+
that error */
1274+
if (err != ERROR_OPERATION_ABORTED)
1275+
system_printf ("error on read of child wait pipe %p, %E", vchild.rd_proc_pipe);
12721276
break;
12731277
}
12741278

winsup/cygwin/sigproc.cc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -413,7 +413,11 @@ proc_terminate ()
413413
to 1 iff it is a Cygwin process. */
414414
if (!have_execed || !have_execed_cygwin)
415415
chld_procs[i]->ppid = 1;
416-
if (chld_procs[i].wait_thread)
416+
/* Attempt to exit the wait_thread cleanly via CancelSynchronousIo
417+
before falling back to the (explicitly dangerous) cross-thread
418+
termination */
419+
if (chld_procs[i].wait_thread
420+
&& !CancelSynchronousIo (chld_procs[i].wait_thread->thread_handle ()))
417421
chld_procs[i].wait_thread->terminate_thread ();
418422
/* Release memory associated with this process unless it is 'myself'.
419423
'myself' is only in the chld_procs table when we've execed. We
@@ -1198,7 +1202,11 @@ remove_proc (int ci)
11981202
{
11991203
if (have_execed)
12001204
{
1201-
if (_my_tls._ctinfo != chld_procs[ci].wait_thread)
1205+
/* Attempt to exit the wait_thread cleanly via CancelSynchronousIo
1206+
before falling back to the (explicitly dangerous) cross-thread
1207+
termination */
1208+
if (_my_tls._ctinfo != chld_procs[ci].wait_thread
1209+
&& !CancelSynchronousIo (chld_procs[ci].wait_thread->thread_handle ()))
12021210
chld_procs[ci].wait_thread->terminate_thread ();
12031211
}
12041212
else if (chld_procs[ci] && chld_procs[ci]->exists ())

0 commit comments

Comments
 (0)