-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Open
Description
I've observed cases where clang emits a suboptimal sequence of mov + orr or when a movk would suffice:
The most straightforward case is as follows:
uint64_t foo(uint32_t* raw){
// load
uint64_t res = *raw;
// movk
res |= (uint64_t)0xfffd << 48;
// ret
return res;
}
For which clang emits:
ldr w8, [x0]
mov x9, #-844424930131968
orr x0, x8, x9
ret
The mov + orr could instead be a single movk.
A related (but as far as I can tell, distinct) case:
uint64_t bar(uint32_t* raw){
// load
uint64_t res = *raw;
// movk
res |= (uint64_t)0xfffd << 48;
// asr
res = (int64_t)res >> 3;
// ret
return res;
}
For which clang emits:
ldr w8, [x0]
mov x9, #175921860444160
movk x9, #65535, lsl #48
orr x0, x9, x8, lsr #3
ret
Based on a quick scan of the resulting IR, the additional consideration here is that the arithmetic shift is turned into a logical shift (and the constant is made correspondingly larger) before getting to ISel.