Skip to content

Commit 1cea4a0

Browse files
[AMDGPU][NPM] Fix CFG invalidation detection in insertSimulatedTrap (#169290)
When SIMULATED_TRAP is at the end of a block with no successors, insertSimulatedTrap incorrectly returns the original MBB despite adding HaltLoopBB to the CFG. EmitInstrWithCustomInserter detects CFG changes by comparing the returned MBB with the original. When they match, it assumes no modification occurred and skips MachineLoopInfo invalidation. This causes stale loop information in subsequent passes, particularly when using the NPM which relies on accurate invalidation signals. Fix: Return HaltLoopBB to properly signal the CFG modification.
1 parent bd0769e commit 1cea4a0

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1963,6 +1963,10 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
19631963
BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_EXECNZ)).addMBB(TrapBB);
19641964
MF->push_back(TrapBB);
19651965
MBB.addSuccessor(TrapBB);
1966+
} else {
1967+
// Since we're adding HaltLoopBB and modifying the CFG, we must return a
1968+
// different block to signal the change.
1969+
ContBB = HaltLoopBB;
19661970
}
19671971

19681972
// Start with a `s_trap 2`, if we're in PRIV=1 and we need the workaround this

0 commit comments

Comments
 (0)