Skip to content

GuardMaskedDivRem Transformation

Tiotto, Ettore edited this page Apr 20, 2026 · 1 revision

GuardMaskedDivRem: Fixing Masked Div/Rem UB Exploitation by LLVM O3

PR: #6675 — Closes #6674

📎 Slides: guard-masked-divrem-slides.pptx


Problem

Triton masked loads lower to conditional blocks that produce phi nodes with a zero default on the false (masked-off) path:

br i1 %mask, label %load_bb, label %merge

load_bb:
  %val = load i32, ptr %p
  br label %merge

merge:
  %phi = phi i32 [%val, %load_bb], [0, %entry]
  %res = sdiv i32 %x, %phi          ; ← UB when mask=false!

When %mask is false, %phi is 0, making sdiv %x, 0 undefined behavior. Although the result is never used on the false path, LLVM is free to exploit this UB. Specifically:

  1. SimplifyCFG sees sdiv x, 0 on the false path
  2. Inserts llvm.assume(%mask == true) (since UB means false path "can't happen")
  3. The assumption propagates to unrelated branches in the function
  4. Predicated stores become unconditionalsilent data corruption

Why freeze Didn't Work (Old Approach)

The original pass (FreezeMaskedDivRemPass) inserted freeze before the divisor:

%frozen = freeze i8 %phi
%z = sdiv i8 %x, %frozen

freeze tells LLVM: "pick an arbitrary but fixed value for poison/undef inputs." However:

  • LLVM proves both phi incoming values are well-defined (a loaded value and a constant 0)
  • Since neither is poison/undef, freeze is a no-op
  • LLVM removes it before SimplifyCFG gets to exploit the UB
  • The corruption returns

The Fix: select Guard (New Approach)

Replace the divisor with select(divisor == 0, 1, divisor):

%is_zero = icmp eq i8 %phi, 0
%safe    = select i1 %is_zero, i8 1, i8 %phi
%z       = sdiv i8 %x, %safe

Why this works:

Case What happens Correct?
mask=true (loaded value ≠ 0) select passes through the real divisor ✅ Unchanged behavior
mask=false (phi = 0) sdiv x, 0 becomes sdiv 0, 1 = 0 ✅ Well-defined, result unused
  • LLVM cannot remove the select — it genuinely changes the value from 0 to 1
  • No UB exists for LLVM to exploit
  • Now covers all division ops: sdiv, srem, udiv, urem

Pass Scheduling Fix

The pass was also moved from inside the O3 pipeline to before it:

Old New
Registration PeepholeEPCallback (inside O3) Standalone FPM before buildPerModuleDefaultPipeline
Problem SimplifyCFG runs first, folds conditional blocks, eliminates the phi nodes the pass needs to match
Result Pass arrives too late Pass runs before any O3 pass can destroy the pattern

Files Changed

File Change
LLVMIRGuardMaskedDivRem.cpp New implementation with select guard (replaces LLVMIRFreezeMaskedDivRem.cpp)
LLVMPasses.h Renamed FreezeMaskedDivRemPassGuardMaskedDivRemPass
triton_xpu.cc Moved pass before O3 pipeline
guard-masked-div-rem.ll Updated FileCheck patterns for icmp+select instead of freeze
test_divide.py Removed TRITON_INTEL_PREDICATED_LOAD=1 fixture (pass works regardless)

Clone this wiki locally