Skip to content

Commit f5bfdc8

Browse files
Waiman-LongPeter Zijlstra
authored andcommitted
locking/osq: Use optimized spinning loop for arm64
Arm64 has a more optimized spinning loop (atomic_cond_read_acquire) using wfe for spinlock that can boost performance of sibling threads by putting the current cpu to a wait state that is broken only when the monitored variable changes or an external event happens. OSQ has a more complicated spinning loop. Besides the lock value, it also checks for need_resched() and vcpu_is_preempted(). The check for need_resched() is not a problem as it is only set by the tick interrupt handler. That will be detected by the spinning cpu right after iret. The vcpu_is_preempted() check, however, is a problem as changes to the preempt state of of previous node will not affect the wait state. For ARM64, vcpu_is_preempted is not currently defined and so is a no-op. Will has indicated that he is planning to para-virtualize wfe instead of defining vcpu_is_preempted for PV support. So just add a comment in arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted() should not be defined as suggested. On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking microbenchmark was run for 10s with and without the patch. The performance numbers before patch were: Running locktest with mutex [runtime = 10s, load = 1] Threads = 224, Min/Mean/Max = 316/123,143/2,121,269 Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s After patch, the numbers were: Running locktest with mutex [runtime = 10s, load = 1] Threads = 224, Min/Mean/Max = 334/147,836/1,304,787 Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s So there was about 20% performance improvement. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 5709712 commit f5bfdc8

File tree

2 files changed

+19
-13
lines changed

2 files changed

+19
-13
lines changed

arch/arm64/include/asm/spinlock.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,13 @@
1111
/* See include/linux/spinlock.h */
1212
#define smp_mb__after_spinlock() smp_mb()
1313

14+
/*
15+
* Changing this will break osq_lock() thanks to the call inside
16+
* smp_cond_load_relaxed().
17+
*
18+
* See:
19+
* https://lore.kernel.org/lkml/[email protected]
20+
*/
21+
#define vcpu_is_preempted(cpu) false
22+
1423
#endif /* __ASM_SPINLOCK_H */

kernel/locking/osq_lock.c

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -134,20 +134,17 @@ bool osq_lock(struct optimistic_spin_queue *lock)
134134
* cmpxchg in an attempt to undo our queueing.
135135
*/
136136

137-
while (!READ_ONCE(node->locked)) {
138-
/*
139-
* If we need to reschedule bail... so we can block.
140-
* Use vcpu_is_preempted() to avoid waiting for a preempted
141-
* lock holder:
142-
*/
143-
if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
144-
goto unqueue;
145-
146-
cpu_relax();
147-
}
148-
return true;
137+
/*
138+
* Wait to acquire the lock or cancelation. Note that need_resched()
139+
* will come with an IPI, which will wake smp_cond_load_relaxed() if it
140+
* is implemented with a monitor-wait. vcpu_is_preempted() relies on
141+
* polling, be careful.
142+
*/
143+
if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
144+
vcpu_is_preempted(node_cpu(node->prev))))
145+
return true;
149146

150-
unqueue:
147+
/* unqueue */
151148
/*
152149
* Step - A -- stabilize @prev
153150
*

0 commit comments

Comments
 (0)