Skip to content

Commit 3328b49

Browse files
committed
WIP add timeouts to the gotchas doc
1 parent 2faca4a commit 3328b49

File tree

1 file changed

+44
-5
lines changed

1 file changed

+44
-5
lines changed

gotchas.md

Lines changed: 44 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ affect the implementation.
2727
* [Matching platform case-sensitivity for environment variables](#matching-platform-case-sensitivity-for-environment-variables)
2828
* [Using IO threads to avoid blocking children](#using-io-threads-to-avoid-blocking-children)
2929
* [Killing grandchild processes?](#killing-grandchild-processes)
30+
* [Waiting with a timeout](#waiting-with-a-timeout)
3031

3132
## Reporting errors by default
3233

@@ -57,9 +58,10 @@ reading all of its input. Most standard libraries get this right.
5758

5859
Notably on Unix, this requires the process to suppress `SIGPIPE`.
5960
Implementations in languages that don't suppress `SIGPIPE` by default (C/C++?)
60-
have no choice but to set a signal handler from library code, which might
61-
conflict with application code or other libraries. There is no good solution to
62-
this problem.
61+
have to configure signal handling from library code, which might conflict with
62+
application code or other libraries in the rare case that something does want
63+
to receive that signal. (See [Waiting with a timeout](#waiting-with-a-timeout)
64+
below for more on handling signals from library code.)
6365

6466
## Cleaning up zombie children
6567

@@ -101,8 +103,7 @@ PID. It's not likely, but all of that could happen just before the call to
101103
is why the Rust standard library [doesn't allow shared access to child
102104
processes](https://doc.rust-lang.org/std/process/struct.Child.html#method.kill).
103105

104-
It's possible to avoid this race using a newer POSIX API called
105-
[`waitid`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html).
106+
It's possible to avoid this race using a newer POSIX API called [`waitid`].
106107
That function has a `WNOWAIT` flag that leaves the child in its zombie state,
107108
so that its PID isn't freed for reuse. That gives the waiting thread a chance
108109
to set a flag to block further kills, before reaping the child. Duct uses this
@@ -345,3 +346,41 @@ objects](https://docs.microsoft.com/en-us/windows/win32/procthread/job-objects))
345346
but even there it sounds like some important features aren't supported on
346347
Windows 7. Realistically, there won't be good techniques for Duct to use to
347348
solve this problem for many years.
349+
350+
## Waiting with a timeout
351+
352+
The Windows [`WaitForSingleObject`] function has a timeout argument, but the
353+
Unix [`waitpid`], [`waitid`], and [`pthread_join`] functions do not. That makes
354+
it complicated to do any sort of waiting with a timeout on Unix.
355+
356+
- for threads we can add code
357+
- what does Python do?
358+
- PyEvent on thread exit
359+
- also their locks are actually all condvars on the inside and support
360+
waiting with a timeout
361+
- for children we need to handle SIGCHLD
362+
- what does Python do? (timeout?)
363+
- OMG does Python have the wait/try_wait race?!
364+
- signal_hook_registry race condition
365+
- can we just have sigaction() write back to the global?
366+
- does Rust Child::wait fail to handle eintr?
367+
368+
369+
I want try_wait/poll to always actually check, because you might be calling it
370+
in response to SIGCHLD or something like that, and in that case it's
371+
unacceptable for a race against the blocking thread to cause you to return
372+
None.
373+
374+
That means that all reaping should actually be done by calling into wait().
375+
376+
That means that double locking is viable, like the old Python implementation.
377+
Maybe we don't need a condvar?
378+
- But what does that mean for timeouts? What's cleaner? Unix has SIGCHLD, and
379+
Windows doesn't have to worry about reaping, but what would we need to do if
380+
there was a timeout parameter to the Unix waitid? In that case we would need
381+
the condvar.
382+
383+
[`WaitForSingleObject`]: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject
384+
[`waitpid`]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html
385+
[`waitid`]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html
386+
[`pthread_join`]: https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_join.html

0 commit comments

Comments
 (0)