Skip to content

Commit 146034f

Browse files
vincent-mailholsuryasaimadhu
authored andcommitted
x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
For x86_64, the current ffs() implementation does not produce optimized code when called with a constant expression. On the contrary, the __builtin_ffs() functions of both GCC and clang are able to fold the expression into a single instruction. ** Example ** Consider two dummy functions foo() and bar() as below: #include <linux/bitops.h> #define CONST 0x01000000 unsigned int foo(void) { return ffs(CONST); } unsigned int bar(void) { return __builtin_ffs(CONST); } GCC would produce below assembly code: 0000000000000000 <foo>: 0: ba 00 00 00 01 mov $0x1000000,%edx 5: b8 ff ff ff ff mov $0xffffffff,%eax a: 0f bc c2 bsf %edx,%eax d: 83 c0 01 add $0x1,%eax 10: c3 ret <Instructions after ret and before next function were redacted> 0000000000000020 <bar>: 20: b8 19 00 00 00 mov $0x19,%eax 25: c3 ret And clang would produce: 0000000000000000 <foo>: 0: b8 ff ff ff ff mov $0xffffffff,%eax 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc> c: 83 c0 01 add $0x1,%eax f: c3 ret 0000000000000010 <bar>: 10: b8 19 00 00 00 mov $0x19,%eax 15: c3 ret Both examples clearly demonstrate the benefit of using __builtin_ffs() instead of the kernel's asm implementation for constant expressions. However, for non constant expressions, the kernel's ffs() asm version remains better for x86_64 because, contrary to GCC, it doesn't emit the CMOV assembly instruction, c.f. [1] (noticeably, clang is able optimize out the CMOV call). Use __builtin_constant_p() to select between the kernel's ffs() and the __builtin_ffs() depending on whether the argument is constant or not. As a side benefit, replacing the ffs() function declaration by a macro also removes below -Wshadow warning: ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow] 283 | static __always_inline int ffs(int x) ** Statistics ** On a allyesconfig, before...: $ objdump -d vmlinux.o | grep bsf | wc -l 1081 ...and after: $ objdump -d vmlinux.o | grep bsf | wc -l 792 So, roughly 26.7% of the calls to ffs() were using constant expressions and could be optimized out. (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) [1] commit ca3d30c ("x86_64, asm: Optimise fls(), ffs() and fls64()") [ bp: Massage commit message. ] Signed-off-by: Vincent Mailhol <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 521a547 commit 146034f

File tree

1 file changed

+14
-12
lines changed

1 file changed

+14
-12
lines changed

arch/x86/include/asm/bitops.h

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -292,18 +292,7 @@ static __always_inline unsigned long __fls(unsigned long word)
292292
#undef ADDR
293293

294294
#ifdef __KERNEL__
295-
/**
296-
* ffs - find first set bit in word
297-
* @x: the word to search
298-
*
299-
* This is defined the same way as the libc and compiler builtin ffs
300-
* routines, therefore differs in spirit from the other bitops.
301-
*
302-
* ffs(value) returns 0 if value is 0 or the position of the first
303-
* set bit if value is nonzero. The first (least significant) bit
304-
* is at position 1.
305-
*/
306-
static __always_inline int ffs(int x)
295+
static __always_inline int variable_ffs(int x)
307296
{
308297
int r;
309298

@@ -333,6 +322,19 @@ static __always_inline int ffs(int x)
333322
return r + 1;
334323
}
335324

325+
/**
326+
* ffs - find first set bit in word
327+
* @x: the word to search
328+
*
329+
* This is defined the same way as the libc and compiler builtin ffs
330+
* routines, therefore differs in spirit from the other bitops.
331+
*
332+
* ffs(value) returns 0 if value is 0 or the position of the first
333+
* set bit if value is nonzero. The first (least significant) bit
334+
* is at position 1.
335+
*/
336+
#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x))
337+
336338
/**
337339
* fls - find last set bit in word
338340
* @x: the word to search

0 commit comments

Comments
 (0)