-
Notifications
You must be signed in to change notification settings - Fork 37
[AIE2P] Large struct phi with zeroinitializer fails to zero ACC2048 registers; crashes at -O0 #805
Copy link
Copy link
Open
Description
Summary
A phi node carrying a large struct ({i32, i64 x 128}, 129 fields) with zeroinitializer as the entry value has two issues on
aie2p-none-unknown-elf:
- -O0 crash:
LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT - -O2 silent miscompile: The generated code fails to zero the ACC2048 hardware registers, causing alternating correct/incorrect results across
repeated kernel invocations on the NPU.
A smaller struct ({i32, i64 x 32}, 33 fields) with the same pattern works correctly at both -O0 and -O2.
Environment
- llvm-aie:
20.0.0.2025090701+8c084497(commit 8c084497) - Target:
aie2p-none-unknown-elf - Hardware (for runtime bug): AMD Ryzen AI NPU (Krackan Point, AIE2P)
Reproducer
See attached peano_bug_repro.ll.
The struct packs a loop counter + 4 ACC2048 vectors (32 × i64 each = 128 i64 fields total):
%acc_state = type { i32, i64, i64, ..., i64 } ; 129 fields
k_loop:
%state = phi %acc_state [ zeroinitializer, %entry ], [ %state_next, %k_continue ]
; extract 4 groups of 32 × i64 → <32 x i64>, call mac.conf, insertvalue backReproducing the -O0 crash (no hardware needed)
llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O0 --filetype=obj -o /dev/nullExpected: compiles successfully.
Actual: LLVM ERROR: unable to legalize instruction: G_INSERT_VECTOR_ELT
Reproducing the -O2 runtime bug (requires AIE2P NPU)
llc peano_bug_repro.ll --mtriple=aie2p-none-unknown-elf -O2 --filetype=obj -o repro.oCompiles successfully, but when linked into an AIE2P kernel and invoked repeatedly:
- Even invocations: wrong results (accumulator retains stale values from previous invocation, e.g. 68 instead of expected 32)
- Odd invocations: correct results (all 32)
- Only fields 1–64 (first two ACC2048 groups) are affected; fields 65–128 (last two) are always correct
- Pattern is 100% deterministic
Working case
The same phi pattern with a 33-field struct (1 accumulator instead of 4) works correctly:
%acc_state_small = type { i32, i64, i64, ..., i64 } ; 33 fields
; → correct ACC zeroing on every invocation, no crash at -O0Workaround
Restructure the kernel to carry only one <32 x i64> accumulator per loop phi node instead of four.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels