-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Closed
Copy link
Description
After the changes introduced in #136091, we started experiencing a miscompile on AArch64 in our local testing. Here is a reduced LLVM IR example:
; ModuleID = 'Test.ll'
target triple = "aarch64-none-linux-gnu"
define i32 @main(ptr addrspace(1) %p) {
%1 = load <2 x i32>, ptr addrspace(1) %p, align 4
%2 = extractelement <2 x i32> %1, i64 0
%3 = call i32 @llvm.ctpop.i32(i32 %2)
%4 = insertelement <2 x i32> <i32 -1, i32 poison>, i32 %3, i64 1
%5 = sub <2 x i32> %1, %4
store <2 x i32> %5, ptr addrspace(1) %p, align 4
ret i32 0
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.ctpop.i32(i32)Here is an llc (trunk) output:
main:
ldr d0, [x0]
movi v2.2d, #0xffffffffffffffff
mov x8, x0
mov w0, wzr
fmov w9, s0
fmov s1, w9
cnt v1.8b, v1.8b
addv b1, v1.8b
mov v2.b[4], v1.b[0]
sub v0.2s, v0.2s, v2.2s
str d0, [x8]
rethttps://godbolt.org/z/c5YP7rYbE
Current transformation seems to be incorrect because:
- The instruction
mov v2.b[4], v1.b[0]only updates a single byte (byte 4) of the v2 vector register. - However, the LLVM IR expects a full 32-bit insertion into the second element (insertelement at index 1).
- Because the rest of the 32-bit lane in v2.s[1] remains filled with 0xFF (due to
movi v2.2d, #0xFFFFFFFFFFFFFFFF), the resulting subtraction computes an incorrect value.
llc 20.1.0 output (before applying #136091)
main: // @main
ldr d1, [x0]
movi v0.2d, #0xffffffffffffffff
mov x8, x0
mov w0, wzr
fmov w9, s1
fmov s2, w9
cnt v2.8b, v2.8b
addv b2, v2.8b
fmov w9, s2
mov v0.s[1], w9
sub v0.2s, v1.2s, v0.2s
str d0, [x8]
rethttps://godbolt.org/z/qqvM1b3hc
Why this is correct?
mov v0.s[1], w9fully overwrites the entire 32-bit lane in v0, matching the semantics of LLVM IR's insertelement.- This avoids leftover bytes from the earlier
moviinitialization, ensuring the result is correct.
@davemgreen David, could you please take a look when you have a moment?