Skip to content

[AArch64] Suboptimal abs-diff codegenΒ #118413

@Kmeakin

Description

@Kmeakin

https://godbolt.org/z/Wh9sE4754
https://alive2.llvm.org/ce/z/N6uVzT

In the 32-bit and 64-bit cases, tgt is obviously better than src, since it is one instruction shorter:

src_u32:
        sub     w8, w1, w0
        subs    w9, w0, w1
        csel    w0, w9, w8, hi
        ret

tgt_u32:
        subs    w8, w0, w1
        cneg    w0, w8, lo
        ret

In the 8-bit and 16-bit cases, src and tgt have the same number of instructions, but tgt has more ILP than src:

src_u8:
        and     w8, w0, #0xff
        sub     w8, w8, w1, uxtb
        cmp     w8, #0
        cneg    w0, w8, mi
        ret

tgt_u8:
        and     w8, w0, #0xff
        sub     w9, w0, w1
        cmp     w8, w1, uxtb
        cneg    w0, w9, ls
        ret

I suspect the code generated for tgt in the 128-bit cases is not optimal either

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions