You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimize virt_spin_lock() to use simpler and faster:
atomic_try_cmpxchg(*ptr, &val, new)
instead of:
atomic_cmpxchg(*ptr, val, new) == val
The x86 CMPXCHG instruction returns success in the ZF flag, so
this change saves a compare after the CMPXCHG.
Also optimize retry loop a bit. atomic_try_cmpxchg() fails iff
&lock->val != 0, so there is no need to load and compare the
lock value again - cpu_relax() can be unconditinally called in
this case. This allows us to generate optimized:
1f: ba 01 00 00 00 mov $0x1,%edx
24: 8b 03 mov (%rbx),%eax
26: 85 c0 test %eax,%eax
28: 75 63 jne 8d <...>
2a: f0 0f b1 13 lock cmpxchg %edx,(%rbx)
2e: 75 5d jne 8d <...>
...
8d: f3 90 pause
8f: eb 93 jmp 24 <...>
instead of:
1f: ba 01 00 00 00 mov $0x1,%edx
24: 8b 03 mov (%rbx),%eax
26: 85 c0 test %eax,%eax
28: 75 13 jne 3d <...>
2a: f0 0f b1 13 lock cmpxchg %edx,(%rbx)
2e: 85 c0 test %eax,%eax
30: 75 f2 jne 24 <...>
...
3d: f3 90 pause
3f: eb e3 jmp 24 <...>
Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
0 commit comments