Skip to content

Missed use of movk on aarch64 when performing disjoint OR with shifted 16 bit immediate #130376

@neildhar

Description

@neildhar

I've observed cases where clang emits a suboptimal sequence of mov + orr or when a movk would suffice:

The most straightforward case is as follows:

uint64_t foo(uint32_t* raw){
    // load
    uint64_t res = *raw;
    // movk
    res |= (uint64_t)0xfffd << 48;
    // ret
    return res;
}

For which clang emits:

        ldr     w8, [x0]
        mov     x9, #-844424930131968
        orr     x0, x8, x9
        ret

The mov + orr could instead be a single movk.

A related (but as far as I can tell, distinct) case:

uint64_t bar(uint32_t* raw){
    // load
    uint64_t res = *raw;
    // movk
    res |= (uint64_t)0xfffd << 48;
    // asr
    res = (int64_t)res >> 3;
    // ret
    return res;
}

For which clang emits:

        ldr     w8, [x0]
        mov     x9, #175921860444160
        movk    x9, #65535, lsl #48
        orr     x0, x9, x8, lsr #3
        ret

Based on a quick scan of the resulting IR, the additional consideration here is that the arithmetic shift is turned into a logical shift (and the constant is made correspondingly larger) before getting to ISel.

https://godbolt.org/z/h95E3zhhd

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions