-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Open
Description
https://godbolt.org/z/Wh9sE4754
https://alive2.llvm.org/ce/z/N6uVzT
In the 32-bit and 64-bit cases, tgt is obviously better than src, since it is one instruction shorter:
src_u32:
sub w8, w1, w0
subs w9, w0, w1
csel w0, w9, w8, hi
ret
tgt_u32:
subs w8, w0, w1
cneg w0, w8, lo
retIn the 8-bit and 16-bit cases, src and tgt have the same number of instructions, but tgt has more ILP than src:
src_u8:
and w8, w0, #0xff
sub w8, w8, w1, uxtb
cmp w8, #0
cneg w0, w8, mi
ret
tgt_u8:
and w8, w0, #0xff
sub w9, w0, w1
cmp w8, w1, uxtb
cneg w0, w9, ls
retI suspect the code generated for tgt in the 128-bit cases is not optimal either
dtcxzyw