Skip to content

Commit d0dd066

Browse files
Christoph Lameter (Ampere)torvalds
authored andcommitted
seqcount: replace smp_rmb() in read_seqcount() with load acquire
Many architectures support load acquire which can replace a memory barrier and save some cycles. A typical sequence do { seq = read_seqcount_begin(&s); <something> } while (read_seqcount_retry(&s, seq); requires 13 cycles on an N1 Neoverse arm64 core (Ampere Altra, to be specific) for an empty loop. Two read memory barriers are needed. One for each of the seqcount_* functions. We can replace the first read barrier with a load acquire of the seqcount which saves us one barrier. On the Altra doing so reduces the cycle count from 13 to 8. According to ARM, this is a general improvement for the ARM64 architecture and not specific to a certain processor. See https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions "Weaker ordering requirements that are imposed by Load-Acquire and Store-Release instructions allow for micro-architectural optimizations, which could reduce some of the performance impacts that are otherwise imposed by an explicit memory barrier. If the ordering requirement is satisfied using either a Load-Acquire or Store-Release, then it would be preferable to use these instructions instead of a DMB" [ NOTE! This is my original minimal patch that unconditionally switches over to using smp_load_acquire(), instead of the much more involved and subtle patch that Christoph Lameter wrote that made it conditional. But Christoph gets authorship credit because I had initially thought that we needed the more complex model, and Christoph ran with it it and did the work. Only after looking at code generation for all the relevant architectures, did I come to the conclusion that nobody actually really needs the old "smp_rmb()" model. Even architectures without load-acquire support generally do as well or better with smp_load_acquire(). So credit to Christoph, but if this then causes issues on other architectures, put the blame solidly on me. Also note as part of the ruthless simplification, this gets rid of the overly subtle optimization where some code uses a non-barrier version of the sequence count (see the __read_seqcount_begin() users in fs/namei.c). They then play games with their own barriers and/or with nested sequence counts. Those optimizations are literally meaningless on x86, and questionable elsewhere. If somebody can show that they matter, we need to re-do them more cleanly than "use an internal helper". - Linus ] Signed-off-by: Christoph Lameter (Ampere) <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Linus Torvalds <[email protected]>
1 parent de5cb0d commit d0dd066

File tree

1 file changed

+5
-20
lines changed

1 file changed

+5
-20
lines changed

include/linux/seqlock.h

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ __seqprop_##lockname##_const_ptr(const seqcount_##lockname##_t *s) \
157157
static __always_inline unsigned \
158158
__seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \
159159
{ \
160-
unsigned seq = READ_ONCE(s->seqcount.sequence); \
160+
unsigned seq = smp_load_acquire(&s->seqcount.sequence); \
161161
\
162162
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
163163
return seq; \
@@ -170,7 +170,7 @@ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \
170170
* Re-read the sequence counter since the (possibly \
171171
* preempted) writer made progress. \
172172
*/ \
173-
seq = READ_ONCE(s->seqcount.sequence); \
173+
seq = smp_load_acquire(&s->seqcount.sequence); \
174174
} \
175175
\
176176
return seq; \
@@ -208,7 +208,7 @@ static inline const seqcount_t *__seqprop_const_ptr(const seqcount_t *s)
208208

209209
static inline unsigned __seqprop_sequence(const seqcount_t *s)
210210
{
211-
return READ_ONCE(s->sequence);
211+
return smp_load_acquire(&s->sequence);
212212
}
213213

214214
static inline bool __seqprop_preemptible(const seqcount_t *s)
@@ -263,17 +263,9 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
263263
#define seqprop_assert(s) __seqprop(s, assert)(s)
264264

265265
/**
266-
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
266+
* __read_seqcount_begin() - begin a seqcount_t read section
267267
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
268268
*
269-
* __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
270-
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is
271-
* provided before actually loading any of the variables that are to be
272-
* protected in this critical section.
273-
*
274-
* Use carefully, only in critical code, and comment how the barrier is
275-
* provided.
276-
*
277269
* Return: count to be passed to read_seqcount_retry()
278270
*/
279271
#define __read_seqcount_begin(s) \
@@ -293,13 +285,7 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
293285
*
294286
* Return: count to be passed to read_seqcount_retry()
295287
*/
296-
#define raw_read_seqcount_begin(s) \
297-
({ \
298-
unsigned _seq = __read_seqcount_begin(s); \
299-
\
300-
smp_rmb(); \
301-
_seq; \
302-
})
288+
#define raw_read_seqcount_begin(s) __read_seqcount_begin(s)
303289

304290
/**
305291
* read_seqcount_begin() - begin a seqcount_t read critical section
@@ -328,7 +314,6 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
328314
({ \
329315
unsigned __seq = seqprop_sequence(s); \
330316
\
331-
smp_rmb(); \
332317
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
333318
__seq; \
334319
})

0 commit comments

Comments
 (0)