Skip to content

Conversation

@ioquatix
Copy link
Member

@ioquatix ioquatix commented Dec 14, 2024

It looks like EPoll can hang when executing system. If process_wait is called in rapid succession and hits the same file descriptor, the internal state can be reused, which may assume the epoll is currently armed when it is not, leading to a hang.

  1. IO_Event_Selector_EPoll_Waiting_register(..., 0, pidfd, ...)
  2. epoll_ctl ADD.
  3. epoll_wait -> status
  4. IO_Event_Selector_EPoll_Waiting_register(..., 0, pidfd, ...) but we hit epoll_descriptor->registered_events == epoll_descriptor->waiting_events branch (return).
  5. Hang

This is because the argument for VALUE io is 0 in both cases, so we assume the IO is the same object, and thus the registration has not changed. By providing _pid as the io object, we prevent this from happening. I'll think about a better solution, as this feels like a hack; we can probably clear the state after process_wait instead, on all cases, which prevents any re-use. _pid is unlikely to wrap around rapidly so the chance of hitting this bug is greatly reduced, but not zero, with the current proposed change.

Fixes socketry/async#351.

Types of Changes

  • Bug fix.

Contribution

@ioquatix ioquatix changed the title Reproduction. process_wait can hang if reusing FD in rapid succession. Dec 15, 2024
@ioquatix ioquatix merged commit 8a7b717 into main Dec 15, 2024
60 of 70 checks passed
@ioquatix ioquatix deleted the system-hang branch December 15, 2024 10:51
@mezbahalam
Copy link

You are an extraordinary wizard @ioquatix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

system hangs if file does not exist

2 participants