-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
Repro:
#include <arm_neon.h>
void partialWrite(uint8_t* p, uint16x8_t vec) {
vst1q_lane_u16(reinterpret_cast<uint16_t*>(p), vec, 0);
vst1q_lane_u8(p + 2, vreinterpretq_u8_u16(vec), 2);
}
Currently produces:
$ clang++ -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16 repro.cpp
...
add x8, x0, #2
str h0, [x0]
st1 { v0.b }[2], [x8]
ret
misses the opportunity to use post-increment on the first str Could be the following (as produced by GCC):
str h0, [x0], 2
st1 {v0.b}[2], [x0]
ret
(this mirrors meta T222168293 )