Skip to content

Commit 8a578a7

Browse files
mrutland-armctmarinas
authored andcommitted
arm64: atomics: lse: improve constraints for simple ops
We have overly conservative assembly constraints for the basic FEAT_LSE atomic instructions, and using more accurate and permissive constraints will allow for better code generation. The FEAT_LSE basic atomic instructions have come in two forms: LD{op}{order}{size} <Rs>, <Rt>, [<Rn>] ST{op}{order}{size} <Rs>, [<Rn>] The ST* forms are aliases of the LD* forms where: ST{op}{order}{size} <Rs>, [<Rn>] Is: LD{op}{order}{size} <Rs>, XZR, [<Rn>] For either form, both <Rs> and <Rn> are read but not written back to, and <Rt> is written with the original value of the memory location. Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the other register value(s) are consumed. There are no UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or <Rn> are the same register. Our current inline assembly always uses <Rs> == <Rt>, treating this register as both an input and an output (using a '+r' constraint). This forces the compiler to do some unnecessary register shuffling and/or redundant value generation. For example, the compiler cannot reuse the <Rs> value, and currently GCC 11.1.0 will compile: __lse_atomic_add(1, a); __lse_atomic_add(1, b); __lse_atomic_add(1, c); As: mov w3, #0x1 mov w4, w3 stadd w4, [x0] mov w0, w3 stadd w0, [x1] stadd w3, [x2] We can improve this with more accurate constraints, separating <Rs> and <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an output-only value ('=r'). As <Rt> is written back after <Rs> is consumed, it does not need to be earlyclobber ('=&r'), leaving the compiler free to use the same register for both <Rs> and <Rt> where this is desirable. At the same time, the redundant 'r' constraint for `v` is removed, as the `+Q` constraint is sufficient. With this change, the above example becomes: mov w3, #0x1 stadd w3, [x0] stadd w3, [x1] stadd w3, [x2] I've made this change for the non-value-returning and FETCH ops. The RETURN ops have a multi-instruction sequence for which we cannot use the same constraints, and a subsequent patch will rewrite hte RETURN ops in terms of the FETCH ops, relying on the ability for the compiler to reuse the <Rs> value. This is intended as an optimization. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
1 parent 5e9e43c commit 8a578a7

File tree

1 file changed

+18
-12
lines changed

1 file changed

+18
-12
lines changed

arch/arm64/include/asm/atomic_lse.h

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ static inline void __lse_atomic_##op(int i, atomic_t *v) \
1616
asm volatile( \
1717
__LSE_PREAMBLE \
1818
" " #asm_op " %w[i], %[v]\n" \
19-
: [i] "+r" (i), [v] "+Q" (v->counter) \
20-
: "r" (v)); \
19+
: [v] "+Q" (v->counter) \
20+
: [i] "r" (i)); \
2121
}
2222

2323
ATOMIC_OP(andnot, stclr)
@@ -35,14 +35,17 @@ static inline void __lse_atomic_sub(int i, atomic_t *v)
3535
#define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \
3636
static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \
3737
{ \
38+
int old; \
39+
\
3840
asm volatile( \
3941
__LSE_PREAMBLE \
40-
" " #asm_op #mb " %w[i], %w[i], %[v]" \
41-
: [i] "+r" (i), [v] "+Q" (v->counter) \
42-
: "r" (v) \
42+
" " #asm_op #mb " %w[i], %w[old], %[v]" \
43+
: [v] "+Q" (v->counter), \
44+
[old] "=r" (old) \
45+
: [i] "r" (i) \
4346
: cl); \
4447
\
45-
return i; \
48+
return old; \
4649
}
4750

4851
#define ATOMIC_FETCH_OPS(op, asm_op) \
@@ -124,8 +127,8 @@ static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \
124127
asm volatile( \
125128
__LSE_PREAMBLE \
126129
" " #asm_op " %[i], %[v]\n" \
127-
: [i] "+r" (i), [v] "+Q" (v->counter) \
128-
: "r" (v)); \
130+
: [v] "+Q" (v->counter) \
131+
: [i] "r" (i)); \
129132
}
130133

131134
ATOMIC64_OP(andnot, stclr)
@@ -143,14 +146,17 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
143146
#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \
144147
static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
145148
{ \
149+
s64 old; \
150+
\
146151
asm volatile( \
147152
__LSE_PREAMBLE \
148-
" " #asm_op #mb " %[i], %[i], %[v]" \
149-
: [i] "+r" (i), [v] "+Q" (v->counter) \
150-
: "r" (v) \
153+
" " #asm_op #mb " %[i], %[old], %[v]" \
154+
: [v] "+Q" (v->counter), \
155+
[old] "=r" (old) \
156+
: [i] "r" (i) \
151157
: cl); \
152158
\
153-
return i; \
159+
return old; \
154160
}
155161

156162
#define ATOMIC64_FETCH_OPS(op, asm_op) \

0 commit comments

Comments
 (0)