-
Notifications
You must be signed in to change notification settings - Fork 94
GuardMaskedDivRem Transformation
Tiotto, Ettore edited this page Apr 20, 2026
·
1 revision
📎 Slides: guard-masked-divrem-slides.pptx
Triton masked loads lower to conditional blocks that produce phi nodes with a zero default on the false (masked-off) path:
br i1 %mask, label %load_bb, label %merge
load_bb:
%val = load i32, ptr %p
br label %merge
merge:
%phi = phi i32 [%val, %load_bb], [0, %entry]
%res = sdiv i32 %x, %phi ; ← UB when mask=false!When %mask is false, %phi is 0, making sdiv %x, 0 undefined behavior. Although the result is never used on the false path, LLVM is free to exploit this UB. Specifically:
-
SimplifyCFG sees
sdiv x, 0on the false path - Inserts
llvm.assume(%mask == true)(since UB means false path "can't happen") - The assumption propagates to unrelated branches in the function
- Predicated stores become unconditional → silent data corruption
The original pass (FreezeMaskedDivRemPass) inserted freeze before the divisor:
%frozen = freeze i8 %phi
%z = sdiv i8 %x, %frozenfreeze tells LLVM: "pick an arbitrary but fixed value for poison/undef inputs." However:
- LLVM proves both phi incoming values are well-defined (a loaded value and a constant 0)
- Since neither is poison/undef,
freezeis a no-op - LLVM removes it before SimplifyCFG gets to exploit the UB
- The corruption returns
Replace the divisor with select(divisor == 0, 1, divisor):
%is_zero = icmp eq i8 %phi, 0
%safe = select i1 %is_zero, i8 1, i8 %phi
%z = sdiv i8 %x, %safeWhy this works:
| Case | What happens | Correct? |
|---|---|---|
mask=true (loaded value ≠ 0) |
select passes through the real divisor |
✅ Unchanged behavior |
mask=false (phi = 0) |
sdiv x, 0 becomes sdiv 0, 1 = 0
|
✅ Well-defined, result unused |
- LLVM cannot remove the select — it genuinely changes the value from 0 to 1
- No UB exists for LLVM to exploit
- Now covers all division ops:
sdiv,srem,udiv,urem
The pass was also moved from inside the O3 pipeline to before it:
| Old | New | |
|---|---|---|
| Registration |
PeepholeEPCallback (inside O3) |
Standalone FPM before buildPerModuleDefaultPipeline
|
| Problem | SimplifyCFG runs first, folds conditional blocks, eliminates the phi nodes the pass needs to match | — |
| Result | Pass arrives too late | Pass runs before any O3 pass can destroy the pattern |
| File | Change |
|---|---|
LLVMIRGuardMaskedDivRem.cpp |
New implementation with select guard (replaces LLVMIRFreezeMaskedDivRem.cpp) |
LLVMPasses.h |
Renamed FreezeMaskedDivRemPass → GuardMaskedDivRemPass
|
triton_xpu.cc |
Moved pass before O3 pipeline |
guard-masked-div-rem.ll |
Updated FileCheck patterns for icmp+select instead of freeze
|
test_divide.py |
Removed TRITON_INTEL_PREDICATED_LOAD=1 fixture (pass works regardless) |