Skip to content

Commit 8b46b55

Browse files
author
Yonghong Song
committed
[RFC][Transforms] Prefer unreachable insn over optimizaiton with undef
In some cases, the code has uninitialized variable and the uninitialized variable actually has an impact on the code. For example, $ cat t.c __attribute__((always_inline)) int bar(int a, int *b) { if ((*b) == 0) return a; else return 2 * a; } void tar(int); int foo(int a) { int b; return bar(a, &b); } In the above variable 'b' is uninitialized. With the following compilation flag: clang -O2 -S -emit-llvm -Wall -Werror t.c -mllvm -print-after-all The EarlyCSEPass changes the input IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: %cmp.i = icmp eq i32 undef, 0 %mul.i = shl nsw i32 %a, 1 %retval.0.i = select i1 %cmp.i, i32 %a, i32 %mul.i ret i32 %retval.0.i } to output IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: %mul.i = shl nsw i32 %a, 1 ret i32 %a } In the above, the llvm generates code without any 'undef' values. On the other hand, with the following compilation flag: clang -O2 -S -emit-llvm -Wall -Werror t.c -fsanitize=undefined -mllvm -print-after-all The SCCPPass changes the input IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#3 !func_sanitize !12 { entry: br i1 undef, label %bar.exit, label %if.else.i if.else.i: ; preds = %entry %0 = shl i32 %a, 1 %1 = add i32 %a, 1073741824 %2 = icmp sgt i32 %1, -1 br i1 %2, label %bar.exit, label %handler.mul_overflow.i, !prof !7, !nosanitize !6 handler.mul_overflow.i: ; preds = %if.else.i %3 = zext i32 %a to i64, !nosanitize !6 tail call void @__ubsan_handle_mul_overflow(ptr nonnull @2, i64 2, i64 %3) llvm#5, !nosanitize !6 br label %bar.exit, !nosanitize !6 bar.exit: ; preds = %entry, %if.else.i, %handler.mul_overflow.i %retval.0.i = phi i32 [ %a, %entry ], [ %0, %handler.mul_overflow.i ], [ %0, %if.else.i ] ret i32 %retval.0.i } to output IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#3 !func_sanitize !12 { entry: unreachable } Note that in the above example, the unitialized variable is used cross function boundary which makes frontend harder to emit error messages. And bpf prog needs to pass verifier which makes runtime sanitize not suitable. On the other hand, bpf prog typically is not that big and bpf prog also tends have quite some inlining for performance reason or to avoid kernel verification failure. So detecting and reporting impactful uninit var can help improve bpf developer productivity greatly. There are more discussion in [1] and [2]. To maximize chances to report proper unit var warnings (beyond -Wall -Werror), I would like to discuss two things related to the above example: 1. Avoid generating legal code from 'undef' code. This is needed so the 'undef' code can be carried through entire compilation. And in many cases, 'undef' is eventually transformed to 'unreachable' insn. Generating legal code (without 'undef') will prevent later catching 'undef/unreachable' cases. 2. As in discussions in [2], looks like clang-format does not like BPF Backend to check undef values. So if possible, it would be great to convert 'undef' related code to 'unreachable', e.g. in the above SCCPPass. This RFC intends to have some upstream discussion on how to achieve the above two goals. With this patch, for the following compilation flag: clang -O2 -S -emit-llvm -Wall -Werror t.c -mllvm -print-after-all The EarlyCSEPass changes the input IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: %cmp.i = icmp eq i32 undef, 0 %mul.i = shl nsw i32 %a, 1 %retval.0.i = select i1 %cmp.i, i32 %a, i32 %mul.i ret i32 %retval.0.i } to output IR define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: %mul.i = shl nsw i32 %a, 1 unreachable } And '%mul.i = shl nsw i32 %a, 1' is removed in SimplifyCFGPass. ; *** IR Dump After CorrelatedValuePropagationPass on foo *** ; Function Attrs: nounwind uwtable define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: %mul.i = shl nsw i32 %a, 1 unreachable } ; *** IR Dump After SimplifyCFGPass on foo *** ; Function Attrs: nounwind uwtable define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr llvm#1 { entry: unreachable } [1] https://discourse.llvm.org/t/detect-undefined-behavior-due-to-uninitialized-variables-in-bpf-programs/84116/4?u=yonghong-song [2] llvm#126858
1 parent 606e9fa commit 8b46b55

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

llvm/lib/Analysis/InstructionSimplify.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4829,9 +4829,9 @@ static Value *simplifySelectInst(Value *Cond, Value *TrueVal, Value *FalseVal,
48294829
if (isa<PoisonValue>(CondC))
48304830
return PoisonValue::get(TrueVal->getType());
48314831

4832-
// select undef, X, Y -> X or Y
4832+
// select undef, X, Y -> undef
48334833
if (Q.isUndefValue(CondC))
4834-
return isa<Constant>(FalseVal) ? FalseVal : TrueVal;
4834+
return UndefValue::get(TrueVal->getType());
48354835

48364836
// select true, X, Y --> X
48374837
// select false, X, Y --> Y

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1780,6 +1780,17 @@ bool EarlyCSE::processNode(DomTreeNode *Node) {
17801780
LastStore = nullptr;
17811781
}
17821782
}
1783+
1784+
// If the ret insn returns an undefined (but not a poinson) value,
1785+
// change it to unreachable.
1786+
if (Inst.getOpcode() == Instruction::Ret) {
1787+
auto *RI = cast<ReturnInst>(&Inst);
1788+
auto *RetVal = RI->getReturnValue();
1789+
if (isa<UndefValue>(RetVal) && !isa<PoisonValue>(RetVal)) {
1790+
changeToUnreachable(&Inst, false, NULL, &*MSSAUpdater);
1791+
Changed = true;
1792+
}
1793+
}
17831794
}
17841795

17851796
return Changed;

0 commit comments

Comments
 (0)