GuardMaskedDivRem Transformation

GuardMaskedDivRem: Fixing Masked Div/Rem UB Exploitation by LLVM O3

PR: #6675 — Closes #6674

📎 Slides: guard-masked-divrem-slides.pptx

Problem

Triton masked loads lower to conditional blocks that produce phi nodes with a zero default on the false (masked-off) path:

br i1 %mask, label %load_bb, label %merge

load_bb:
  %val = load i32, ptr %p
  br label %merge

merge:
  %phi = phi i32 [%val, %load_bb], [0, %entry]
  %res = sdiv i32 %x, %phi          ; ← UB when mask=false!

When %mask is false, %phi is 0, making sdiv %x, 0 undefined behavior. Although the result is never used on the false path, LLVM is free to exploit this UB. Specifically:

SimplifyCFG sees sdiv x, 0 on the false path
Inserts llvm.assume(%mask == true) (since UB means false path "can't happen")
The assumption propagates to unrelated branches in the function
Predicated stores become unconditional → silent data corruption

Why `freeze` Didn't Work (Old Approach)

The original pass (FreezeMaskedDivRemPass) inserted freeze before the divisor:

%frozen = freeze i8 %phi
%z = sdiv i8 %x, %frozen

freeze tells LLVM: "pick an arbitrary but fixed value for poison/undef inputs." However:

LLVM proves both phi incoming values are well-defined (a loaded value and a constant 0)
Since neither is poison/undef, freeze is a no-op
LLVM removes it before SimplifyCFG gets to exploit the UB
The corruption returns

The Fix: `select` Guard (New Approach)

Replace the divisor with select(divisor == 0, 1, divisor):

%is_zero = icmp eq i8 %phi, 0
%safe    = select i1 %is_zero, i8 1, i8 %phi
%z       = sdiv i8 %x, %safe

Why this works:

Case	What happens	Correct?
`mask=true` (loaded value ≠ 0)	`select` passes through the real divisor	✅ Unchanged behavior
`mask=false` (phi = 0)	`sdiv x, 0` becomes `sdiv 0, 1 = 0`	✅ Well-defined, result unused

LLVM cannot remove the select — it genuinely changes the value from 0 to 1
No UB exists for LLVM to exploit
Now covers all division ops: sdiv, srem, udiv, urem

Pass Scheduling Fix

The pass was also moved from inside the O3 pipeline to before it:

	Old	New
Registration	`PeepholeEPCallback` (inside O3)	Standalone FPM before `buildPerModuleDefaultPipeline`
Problem	SimplifyCFG runs first, folds conditional blocks, eliminates the phi nodes the pass needs to match	—
Result	Pass arrives too late	Pass runs before any O3 pass can destroy the pattern

Files Changed

File	Change
`LLVMIRGuardMaskedDivRem.cpp`	New implementation with `select` guard (replaces `LLVMIRFreezeMaskedDivRem.cpp`)
`LLVMPasses.h`	Renamed `FreezeMaskedDivRemPass` → `GuardMaskedDivRemPass`
`triton_xpu.cc`	Moved pass before O3 pipeline
`guard-masked-div-rem.ll`	Updated FileCheck patterns for `icmp`+`select` instead of `freeze`
`test_divide.py`	Removed `TRITON_INTEL_PREDICATED_LOAD=1` fixture (pass works regardless)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GuardMaskedDivRem Transformation

GuardMaskedDivRem: Fixing Masked Div/Rem UB Exploitation by LLVM O3

Problem

Why `freeze` Didn't Work (Old Approach)

The Fix: `select` Guard (New Approach)

Pass Scheduling Fix

Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

GuardMaskedDivRem Transformation

GuardMaskedDivRem: Fixing Masked Div/Rem UB Exploitation by LLVM O3

Problem

Why freeze Didn't Work (Old Approach)

The Fix: select Guard (New Approach)

Pass Scheduling Fix

Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Why `freeze` Didn't Work (Old Approach)

The Fix: `select` Guard (New Approach)