Skip to content

transports: use a thread instead of SafeChildWatcher#22913

Merged
allisonkarlitskaya merged 2 commits intocockpit-project:mainfrom
allisonkarlitskaya:threaded-watcher
Feb 24, 2026
Merged

transports: use a thread instead of SafeChildWatcher#22913
allisonkarlitskaya merged 2 commits intocockpit-project:mainfrom
allisonkarlitskaya:threaded-watcher

Conversation

@allisonkarlitskaya
Copy link
Member

SafeChildWatcher was removed in 3.14. We used to use this as a fallback for when we don't have pidfd support in the kernel (or Python stdlib) under the assumption that we'd never see Python 3.14 on a system with such an old kernel, but of course this happens with newer Python versions in containers running on RHEL 8 hosts.

Let's divorce ourselves from the entire concept of ChildWatchers, dropping our 'qdata' cache for them. Instead, we just try to create a pidfd each time, and if that fails, start up a thread to call waitpid() and notify the event loop via call_soon_threadsafe(). This approach is a simplified form of asyncio.ThreadedChildWatcher from the standard library, which is what gets used anywhere pidfds aren't available.

We had a test that caused pidfd to return ENOSYS that we started skipping on newer Python versions to avoid trying to use SafeChildWatcher in what I imagined to be an impossible configuration. Bring that back unconditionally (and rename it to remove the reference to "safe watcher").

martinpitt
martinpitt previously approved these changes Feb 23, 2026
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Given the circumstances of this moving target (asyncio), this is actually a nice solution. 💯

SafeChildWatcher was removed in 3.14.  We used to use this as a fallback
for when we don't have pidfd support in the kernel (or Python stdlib)
under the assumption that we'd never see Python 3.14 on a system with
such an old kernel, but of course this happens with newer Python
versions in containers running on RHEL 8 hosts.

Let's divorce ourselves from the entire concept of ChildWatchers,
dropping our 'qdata' cache for them.  Instead, we just try to create a
pidfd each time, and if that fails, start up a thread to call waitpid()
and notify the event loop via call_soon_threadsafe().  This approach is
a simplified form of asyncio.ThreadedChildWatcher from the standard
library, which is what gets used anywhere pidfds aren't available.

We had a test that caused pidfd to return ENOSYS that we started
skipping on newer Python versions to avoid trying to use
SafeChildWatcher in what I imagined to be an impossible configuration.
Bring that back unconditionally (and rename it to remove the reference
to "safe watcher").
Add tests to verify that non-zero exit codes and signal termination
are correctly reported when using the threaded waitpid() fallback
path (when pidfd is unavailable).

Includes a concurrent test that starts multiple processes, terminates
them in a controlled order, and verifies that the correct exit status
is attributed to each process, and at the expected time.

Assisted-by: Claude Opus 4.5 <noreply@anthropic.com>
@allisonkarlitskaya
Copy link
Member Author

Python 3.6 pain made obscure by the errors of the bridge on the host being sent over ssh to the container and discarded. We really ought to fix the logging story one of these days...

Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's do one round of review, but I agree this is urgent.

def flag_exit() -> None:
def child_exited(pid: int, status: int) -> None:
assert pid == process.pid
# os.waitstatus_to_exitcode() is only available since Python 3.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could become a polyfill, for easier future clean-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. This would fit nicely in polyfills.py with the rest of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allisonkarlitskaya
Copy link
Member Author

Agreed to do the polyfill of os.waitstatus_to_exitcode() as a separate issue so that we can land this and unbreak main.

@allisonkarlitskaya allisonkarlitskaya merged commit 7157811 into cockpit-project:main Feb 24, 2026
93 of 94 checks passed
@allisonkarlitskaya allisonkarlitskaya deleted the threaded-watcher branch February 24, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants