Skip to content

Commit 4e52bca

Browse files
author
Andreas Molzer
authored
Fix memory ordering in needs_wake (#281)
* Fix memory ordering in needs_wake A memory fence is required to ensure the decision of the kernel needing waking is consistent between user-space and kernel. The kernel must not go to sleep without us reliably observing the wake flag, respectively it must see updates to the submission head if our load does not contain updated wake flag. The fence is paired with one in the kernel which ensures that one of the two stores is observed by the respective load of the other party. * Relax memory order of needs_wake There is no other state that is acquired. The barrier which the library or callers are required to ensure provides the necessary synchronization with regards to kernel state. * Fix Submit use of NEED_WAKEUP flag
1 parent 501ee78 commit 4e52bca

File tree

2 files changed

+44
-2
lines changed

2 files changed

+44
-2
lines changed

src/squeue.rs

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,10 +178,45 @@ impl<E: EntryMarker> SubmissionQueue<'_, E> {
178178

179179
/// When [`is_setup_sqpoll`](crate::Parameters::is_setup_sqpoll) is set, whether the kernel
180180
/// threads has gone to sleep and requires a system call to wake it up.
181+
///
182+
/// A result of `false` is only meaningful if the function was called after the latest update
183+
/// to the queue head. Other interpretations could lead to a race condition where the kernel
184+
/// concurrently put the device to sleep and no further progress is made.
181185
#[inline]
182186
pub fn need_wakeup(&self) -> bool {
187+
// See discussions that happened in [#197] and its linked threads in liburing. We need to
188+
// ensure that writes to the head have been visible _to the kernel_ if this load results in
189+
// decision to sleep. This is solved with a SeqCst fence. There is no common modified
190+
// memory location that would provide alternative synchronization.
191+
//
192+
// The kernel, from its sequencing, first writes the wake flag, then performs a full
193+
// barrier (`smp_mb`, or `smp_mb__after_atomic`), then reads the head. We assume that our
194+
// user first writes the head and then reads the `need_wakeup` flag as documented. It is
195+
// necessary to ensure that at least one observes the other write. By establishing a point
196+
// of sequential consistency on both sides between their respective write and read, at
197+
// least one coherency order holds. With regards to the interpretation of the atomic memory
198+
// model of Rust (that is, that of C++20) we're assuming that an `smp_mb` provides at least
199+
// the effect of a `fence(SeqCst)`.
200+
//
201+
// [#197]: https://github.com/tokio-rs/io-uring/issues/197
202+
atomic::fence(atomic::Ordering::SeqCst);
203+
unsafe {
204+
(*self.queue.flags).load(atomic::Ordering::Relaxed) & sys::IORING_SQ_NEED_WAKEUP != 0
205+
}
206+
}
207+
208+
/// The effect of [`Self::need_wakeup`], after synchronization work performed by the caller.
209+
///
210+
/// This function should only be called if the caller can guarantee that a `SeqCst` fence has
211+
/// been inserted after the last write to the queue's head. The function is then a little more
212+
/// efficient by avoiding to perform one itself.
213+
///
214+
/// Failure to uphold the precondition can result in an effective dead-lock due to a sleeping
215+
/// device.
216+
#[inline]
217+
pub fn need_wakeup_after_intermittent_seqcst(&self) -> bool {
183218
unsafe {
184-
(*self.queue.flags).load(atomic::Ordering::Acquire) & sys::IORING_SQ_NEED_WAKEUP != 0
219+
(*self.queue.flags).load(atomic::Ordering::Relaxed) & sys::IORING_SQ_NEED_WAKEUP != 0
185220
}
186221
}
187222

src/submit.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ impl<'a> Submitter<'a> {
5959
#[inline]
6060
fn sq_need_wakeup(&self) -> bool {
6161
unsafe {
62-
(*self.sq_flags).load(atomic::Ordering::Acquire) & sys::IORING_SQ_NEED_WAKEUP != 0
62+
(*self.sq_flags).load(atomic::Ordering::Relaxed) & sys::IORING_SQ_NEED_WAKEUP != 0
6363
}
6464
}
6565

@@ -119,12 +119,17 @@ impl<'a> Submitter<'a> {
119119
// each cause an atomic load of the same variable, self.sq_flags.
120120
// In the hottest paths, when a server is running with sqpoll,
121121
// this is going to be hit twice, when once would be sufficient.
122+
// However, consider that the `SeqCst` barrier required for interpreting
123+
// the IORING_ENTER_SQ_WAKEUP bit is required in all paths where sqpoll
124+
// is setup when consolidating the reads.
122125

123126
if want > 0 || self.params.is_setup_iopoll() || self.sq_cq_overflow() {
124127
flags |= sys::IORING_ENTER_GETEVENTS;
125128
}
126129

127130
if self.params.is_setup_sqpoll() {
131+
// See discussion in [`SubmissionQueue::need_wakeup`].
132+
atomic::fence(atomic::Ordering::SeqCst);
128133
if self.sq_need_wakeup() {
129134
flags |= sys::IORING_ENTER_SQ_WAKEUP;
130135
} else if want == 0 {
@@ -150,6 +155,8 @@ impl<'a> Submitter<'a> {
150155
}
151156

152157
if self.params.is_setup_sqpoll() {
158+
// See discussion in [`SubmissionQueue::need_wakeup`].
159+
atomic::fence(atomic::Ordering::SeqCst);
153160
if self.sq_need_wakeup() {
154161
flags |= sys::IORING_ENTER_SQ_WAKEUP;
155162
} else if want == 0 {

0 commit comments

Comments
 (0)