Conversation
i.e. if the socket isn't closed but is inoperable because the context was shutdown.
The poller will notify the `FDWatcher`'s of sockets with a `WAKEUP` event to ensure that the waiter tasks wake up when calling `wait(socket)`. These events were previously incorrectly cleared by calling `notify(socket)` with the expectation that the new flag would completely override the `FDWatcher` state, but this is not the case. Now we explicitly clear the `WAKEUP` flag on the internal `_FDWatcher` struct. Also fixed `close(::Poller)` so that it will wakeup any waiting tasks, previously it would hang if some tasks were executing `wait(socket)`.
Codecov Report❌ Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| if (event.events & WAKEUP) == WAKEUP | ||
| # If it was a dummy event from the poller then do nothing | ||
| continue |
There was a problem hiding this comment.
I think it would make more sense to handle (i.e. clear) the WAKEUP event immediately? The wakeup doesn't need to exist/persist except to wakeup from the wait/zmq_poll loop and then continue here (and does need to be cleared before the next wait). The later control flow should all still be correct whether or not the poller remains open or closed?
- Poller open:
poller.channelis still open (loop continues) and we wait atwaiter_wait(poller.barrier)to rearm. - Poller closed:
poller.channelis closed (loop exits)
| pollfd = getfield(item.socket, :pollfd) # don't need to check if socket was closed because that would be caught by the socket error check just above | |
| pollfd.watcher.events &= ~WAKEUP | |
| continue |
There was a problem hiding this comment.
I had a go but it was quite tricky because of how wait() and close() wakeup all the sockets regardless, and clearing it directly before the polling calls is a race condition. Clearing it in the same task that does the wakeups is the simplest way I think.
|
So my suggestion was definitely wrong, but I think the current state of this PR might be treating a symptom and not the cause?
using ZMQ
pull1 = ZMQ.Socket(ZMQ.PULL)
bind(pull1, "inproc://pull1")
notify(pull1, ZMQ.WAKEUP)
wait(pull1) # returns WAKEUP and clears the event in the fdwatcher
getfield(pull1, :pollfd).watcher.events == 0
wait(pull1) # would block because WAKEUP was clearedSo the presence of any leftover WAKEUP's means that some sockets are being notified/woken after they've already synchronized. I played around with this a bit and could not figure out the right logic to avoid the redundant WAKEUPs. In any case, I noticed from the FileWatching source that we should make sure to grab the fdwatcher.notify lock before clearing events: @lock pollfd.watcher.notify pollfd.watcher.events &= ~WAKEUP |
I don't think that line quite explains the behaviour though: https://github.com/JuliaLang/julia/blob/5735163b340af5047919b31e3857bf8b866731c6/stdlib/FileWatching/src/FileWatching.jl#L644
Yeah that's done explicitly by both
I considered that but figured it wasn't necessary since an invariant is that at those points there will be no other task acting on the socket or its watcher. But I'll add it, can't hurt anyway 👍 |
Don't forget that all sockets now also include the WAKEUP flag in their mask: Line 25 in 04fcdc1 |
|
Ah yes you're right, it's the wakeup event that's being caught rather than a readable event. |
See commit messages for details.