Skip to content

Commit 69a2a01

Browse files
kkdwvdKernel Patches Daemon
authored andcommitted
rqspinlock: Enclose lock/unlock within lock entry acquisitions
We currently have a tiny window between the fast-path cmpxchg and the grabbing of the lock entry where an NMI could land, attempt the same lock that was just acquired, and end up timing out. This is not ideal. Instead, move the lock entry acquisition from the fast path to before the cmpxchg, and remove the grabbing of the lock entry in the slow path, assuming it was already taken by the fast path. There is a similar case when unlocking the lock. If the NMI lands between the WRITE_ONCE and smp_store_release, it is possible that we end up in a situation where the NMI fails to diagnose the AA condition, leading to a timeout. The TAS fallback is invoked directly without being preceded by the typical fast path, therefore we must continue to grab the deadlock detection entry in that case. Note the changes to the comments in release_held_lock_entry and res_spin_unlock. They talk about prevention of the following scenario, which is introduced by this commit, and was avoided by placing smp_store_release after WRITE_ONCE (the case before this commit): grab entry A lock A grab entry B lock B unlock B smp_store_release(B->locked, 0) grab entry B lock B grab entry A lock A ! <detect ABBA> WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL) If the store release were placed after the WRITE_ONCE, the other CPU would not observe B in the table of the CPU unlocking the lock B. Avoiding this while it was convenient was a prudent choice, but since it leads to missed diagnosis of AA deadlocks in case of NMIs, it does not make sense to keep such ordering any further. Moreover, while this particular schedule is a misdiagnosis, the CPUs are obviously participating in an ABBA deadlock otherwise, and are only lucky to avoid an error before due to the aforementioned race. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
1 parent 2b84d4f commit 69a2a01

File tree

2 files changed

+43
-38
lines changed

2 files changed

+43
-38
lines changed

include/asm-generic/rqspinlock.h

Lines changed: 37 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -129,8 +129,8 @@ static __always_inline void release_held_lock_entry(void)
129129
* <error> for lock B
130130
* release_held_lock_entry
131131
*
132-
* try_cmpxchg_acquire for lock A
133132
* grab_held_lock_entry
133+
* try_cmpxchg_acquire for lock A
134134
*
135135
* Lack of any ordering means reordering may occur such that dec, inc
136136
* are done before entry is overwritten. This permits a remote lock
@@ -139,13 +139,8 @@ static __always_inline void release_held_lock_entry(void)
139139
* CPU holds a lock it is attempting to acquire, leading to false ABBA
140140
* diagnosis).
141141
*
142-
* In case of unlock, we will always do a release on the lock word after
143-
* releasing the entry, ensuring that other CPUs cannot hold the lock
144-
* (and make conclusions about deadlocks) until the entry has been
145-
* cleared on the local CPU, preventing any anomalies. Reordering is
146-
* still possible there, but a remote CPU cannot observe a lock in our
147-
* table which it is already holding, since visibility entails our
148-
* release store for the said lock has not retired.
142+
* The case of unlock is treated differently due to NMI reentrancy, see
143+
* comments in res_spin_unlock.
149144
*
150145
* In theory we don't have a problem if the dec and WRITE_ONCE above get
151146
* reordered with each other, we either notice an empty NULL entry on
@@ -175,10 +170,16 @@ static __always_inline int res_spin_lock(rqspinlock_t *lock)
175170
{
176171
int val = 0;
177172

178-
if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
179-
grab_held_lock_entry(lock);
173+
/*
174+
* Grab the deadlock detection entry before doing the cmpxchg, so that
175+
* reentrancy due to NMIs between the succeeding cmpxchg and creation of
176+
* held lock entry can correctly detect an acquisition attempt in the
177+
* interrupted context.
178+
*/
179+
grab_held_lock_entry(lock);
180+
181+
if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
180182
return 0;
181-
}
182183
return resilient_queued_spin_lock_slowpath(lock, val);
183184
}
184185

@@ -192,28 +193,35 @@ static __always_inline void res_spin_unlock(rqspinlock_t *lock)
192193
{
193194
struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
194195

195-
if (unlikely(rqh->cnt > RES_NR_HELD))
196-
goto unlock;
197-
WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
198-
unlock:
199196
/*
200-
* Release barrier, ensures correct ordering. See release_held_lock_entry
201-
* for details. Perform release store instead of queued_spin_unlock,
202-
* since we use this function for test-and-set fallback as well. When we
203-
* have CONFIG_QUEUED_SPINLOCKS=n, we clear the full 4-byte lockword.
197+
* Release barrier, ensures correct ordering. Perform release store
198+
* instead of queued_spin_unlock, since we use this function for the TAS
199+
* fallback as well. When we have CONFIG_QUEUED_SPINLOCKS=n, we clear
200+
* the full 4-byte lockword.
201+
*/
202+
smp_store_release(&lock->locked, 0);
203+
if (likely(rqh->cnt <= RES_NR_HELD))
204+
WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
205+
/*
206+
* Unlike release_held_lock_entry, we do the lock word release before
207+
* rewriting the entry back to NULL, and place no ordering between the
208+
* WRITE_ONCE and dec, and possible reordering with grabbing an entry.
209+
*
210+
* This opens up a window where another CPU could acquire this lock, and
211+
* then observe it in our table on the current CPU, leading to possible
212+
* misdiagnosis of ABBA when we get reordered with a
213+
* grab_held_lock_entry's writes (see the case described in
214+
* release_held_lock_entry comments).
204215
*
205-
* Like release_held_lock_entry, we can do the release before the dec.
206-
* We simply care about not seeing the 'lock' in our table from a remote
207-
* CPU once the lock has been released, which doesn't rely on the dec.
216+
* This could be avoided if we did the smp_store_release right before
217+
* the dec, ensuring that the remote CPU could only acquire this lock
218+
* and never observe this lock in our table.
208219
*
209-
* Unlike smp_wmb(), release is not a two way fence, hence it is
210-
* possible for a inc to move up and reorder with our clearing of the
211-
* entry. This isn't a problem however, as for a misdiagnosis of ABBA,
212-
* the remote CPU needs to hold this lock, which won't be released until
213-
* the store below is done, which would ensure the entry is overwritten
214-
* to NULL, etc.
220+
* However, that opens up a window where reentrant NMIs on this same
221+
* CPU could have their AA heuristics fail to fire if they land between
222+
* the WRITE_ONCE and unlock release store, which would result in a
223+
* timeout.
215224
*/
216-
smp_store_release(&lock->locked, 0);
217225
this_cpu_dec(rqspinlock_held_locks.cnt);
218226
}
219227

kernel/bpf/rqspinlock.c

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,10 @@ int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock)
278278
int val, ret = 0;
279279

280280
RES_INIT_TIMEOUT(ts);
281+
/*
282+
* The fast path is not invoked for the TAS fallback, so we must grab
283+
* the deadlock detection entry here.
284+
*/
281285
grab_held_lock_entry(lock);
282286

283287
/*
@@ -400,10 +404,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
400404
goto queue;
401405
}
402406

403-
/*
404-
* Grab an entry in the held locks array, to enable deadlock detection.
405-
*/
406-
grab_held_lock_entry(lock);
407+
/* Deadlock detection entry already held after failing fast path. */
407408

408409
/*
409410
* We're pending, wait for the owner to go away.
@@ -451,11 +452,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
451452
*/
452453
queue:
453454
lockevent_inc(lock_slowpath);
454-
/*
455-
* Grab deadlock detection entry for the queue path.
456-
*/
457-
grab_held_lock_entry(lock);
458-
455+
/* Deadlock detection entry already held after failing fast path. */
459456
node = this_cpu_ptr(&rqnodes[0].mcs);
460457
idx = node->count++;
461458
tail = encode_tail(smp_processor_id(), idx);

0 commit comments

Comments
 (0)