Skip to content

Commit 23b2737

Browse files
committed
[analyzer] Suppress out of bounds reports after weak loop assumptions
The checker alpha.security.ArrayBoundV2 produced lots of false positives in situations where loop modeling of the engine fed it with unfounded assumptions. This commit introduces a heuristic that discards ArrayBoundV2 reports when the execution path introduces an assumption that is questionable. More precisely, two kinds of assumptions are categorized as "weak": (1) When the analyzer assumes that the first evaluation of the loop condition returns false and the loop body is completely skipped. (2) When the analyzer assumes that the loop condition is true in a situation where it already executed (at least) two iterations. For examples and more explanation, see the new tests. The actual implementation uses some approximations (it uses the BlockCount instead of the iteration count) because that seems to be "good enough" for this heuristical suppression. Note that I used minor state updates instead of bug reporter visitors because the number of finished iterations is not visible in the visitor which "walks backwards in time". As a very minor unrelated change, this commit removes the "Bin" part from the method name "evalEagerlyAssumeBinOpBifurcation" because this method is also used for the unary logical not operator.
1 parent 622ae7f commit 23b2737

File tree

6 files changed

+233
-30
lines changed

6 files changed

+233
-30
lines changed

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,25 @@ struct EvalCallOptions {
121121
EvalCallOptions() {}
122122
};
123123

124+
/// Simple control flow statements like `if` only produce a single state split,
125+
/// so the fact that they are included in the source code implies that both
126+
/// branches are possible (at least under some conditions) and the analyzer can
127+
/// freely assume either of them. (This is not entirely true, because there may
128+
/// be unmarked logical correlations between `if` statements, but is a good
129+
/// enough heuristic and the analyzer strongly relies on it.)
130+
/// On the other hand, in a loop the state may be split repeatedly at each
131+
/// evaluation of the loop condition, and this can lead to following "weak"
132+
/// assumptions even though the code does not imply that they're valid and the
133+
/// programmer intended to cover them.
134+
/// This function is called to mark the `State` when the engine makes an
135+
/// assumption which is weak. Checkers may use this heuristical mark to discard
136+
/// result and reduce the amount of false positives.
137+
ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
138+
139+
/// Returns true if `recordWeakLoopAssumption()` was called on the execution
140+
/// path which produced `State`.
141+
bool seenWeakLoopAssumption(ProgramStateRef State);
142+
124143
class ExprEngine {
125144
void anchor();
126145

@@ -323,12 +342,13 @@ class ExprEngine {
323342

324343
/// ProcessBranch - Called by CoreEngine. Used to generate successor
325344
/// nodes by processing the 'effects' of a branch condition.
326-
void processBranch(const Stmt *Condition,
327-
NodeBuilderContext& BuilderCtx,
328-
ExplodedNode *Pred,
329-
ExplodedNodeSet &Dst,
330-
const CFGBlock *DstT,
331-
const CFGBlock *DstF);
345+
/// If the branch condition is a loop condition, IterationsFinishedInLoop is
346+
/// the number of already finished iterations (0, 1, 2...); otherwise it's
347+
/// std::nullopt.
348+
void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
349+
ExplodedNode *Pred, ExplodedNodeSet &Dst,
350+
const CFGBlock *DstT, const CFGBlock *DstF,
351+
std::optional<unsigned> IterationsFinishedInLoop);
332352

333353
/// Called by CoreEngine.
334354
/// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
583603
ExplodedNode *Pred,
584604
ExplodedNodeSet &Dst);
585605

586-
/// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly assume symbolic
587-
/// expressions of the form 'x != 0' and generate new nodes (stored in Dst)
588-
/// with those assumptions.
589-
void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src,
590-
const Expr *Ex);
606+
/// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
607+
/// symbolic
608+
/// expressions of the form 'x != 0' or '!x' and generate new nodes (stored
609+
/// in Dst) with those assumptions.
610+
void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
611+
ExplodedNodeSet &Src, const Expr *Ex);
591612

592613
static std::pair<const ProgramPointTag *, const ProgramPointTag *>
593614
geteagerlyAssumeBinOpBifurcationTags();

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -697,6 +697,11 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
697697
ProgramStateRef ErrorState, Messages Msgs,
698698
NonLoc Offset, std::optional<NonLoc> Extent,
699699
bool IsTaintBug /*=false*/) const {
700+
// Suppress results found through execution paths where in some loop the
701+
// analyzer arbitrarily assumed either that the loop is skipped (0 iterations)
702+
// or that 3 or more iterations are executed.
703+
if (seenWeakLoopAssumption(ErrorState))
704+
return;
700705

701706
ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
702707
if (!ErrorNode)

clang/lib/StaticAnalyzer/Core/CoreEngine.cpp

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,10 +441,33 @@ void CoreEngine::HandleCallEnter(const CallEnter &CE, ExplodedNode *Pred) {
441441
void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
442442
const CFGBlock * B, ExplodedNode *Pred) {
443443
assert(B->succ_size() == 2);
444+
445+
const LocationContext *LC = Pred->getLocationContext();
446+
BlockCounter Counter = WList->getBlockCounter();
447+
unsigned BlockCount =
448+
Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
449+
std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
450+
if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
451+
// FIXME: This code approximates the number of finished iteration based on
452+
// the block count, i.e. the number of evaluations of the terminator block
453+
// on the current execution path (which includes the current evaluation, so
454+
// is always at least 1). This is probably acceptable for the
455+
// checker-specific false positive suppression that currently uses this
456+
// value, but it would be better to calcuate an accurate count of
457+
// iterations.
458+
assert(BlockCount >= 1);
459+
IterationsFinishedInLoop = BlockCount - 1;
460+
} else if (isa<DoStmt>(Term)) {
461+
// FIXME: The fixme note in the previous branch also applies here.
462+
// In a do-while loop one iteration happens before the first evaluation of
463+
// the loop condition, so we don't subtract one from the block count.
464+
IterationsFinishedInLoop = BlockCount;
465+
}
466+
444467
NodeBuilderContext Ctx(*this, B, Pred);
445468
ExplodedNodeSet Dst;
446469
ExprEng.processBranch(Cond, Ctx, Pred, Dst, *(B->succ_begin()),
447-
*(B->succ_begin() + 1));
470+
*(B->succ_begin() + 1), IterationsFinishedInLoop);
448471
// Enqueue the new frontier onto the worklist.
449472
enqueue(Dst);
450473
}

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp

Lines changed: 68 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,20 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned>
212212
REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
213213
PendingArrayDestructionMap)
214214

215+
// This trait is used to heuristically filter out results produced from
216+
// execution paths that took "weak" assumptions within a loop.
217+
REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)
218+
219+
ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
220+
return State->set<SeenWeakLoopAssumption>(true);
221+
}
222+
223+
bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
224+
return State->get<SeenWeakLoopAssumption>();
225+
}
226+
227+
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
228+
215229
//===----------------------------------------------------------------------===//
216230
// Engine construction and deletion.
217231
//===----------------------------------------------------------------------===//
@@ -2128,7 +2142,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
21282142
(B->isRelationalOp() || B->isEqualityOp())) {
21292143
ExplodedNodeSet Tmp;
21302144
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Tmp);
2131-
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, cast<Expr>(S));
2145+
evalEagerlyAssumeOpBifurcation(Dst, Tmp, cast<Expr>(S));
21322146
}
21332147
else
21342148
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Dst);
@@ -2401,7 +2415,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
24012415
if (AMgr.options.ShouldEagerlyAssume && (U->getOpcode() == UO_LNot)) {
24022416
ExplodedNodeSet Tmp;
24032417
VisitUnaryOperator(U, Pred, Tmp);
2404-
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, U);
2418+
evalEagerlyAssumeOpBifurcation(Dst, Tmp, U);
24052419
}
24062420
else
24072421
VisitUnaryOperator(U, Pred, Dst);
@@ -2761,12 +2775,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) {
27612775
return State->assume(V);
27622776
}
27632777

2764-
void ExprEngine::processBranch(const Stmt *Condition,
2765-
NodeBuilderContext& BldCtx,
2766-
ExplodedNode *Pred,
2767-
ExplodedNodeSet &Dst,
2768-
const CFGBlock *DstT,
2769-
const CFGBlock *DstF) {
2778+
void ExprEngine::processBranch(
2779+
const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred,
2780+
ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF,
2781+
std::optional<unsigned> IterationsFinishedInLoop) {
27702782
assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) &&
27712783
"CXXBindTemporaryExprs are handled by processBindTemporary.");
27722784
const LocationContext *LCtx = Pred->getLocationContext();
@@ -2808,27 +2820,63 @@ void ExprEngine::processBranch(const Stmt *Condition,
28082820
std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
28092821
else {
28102822
assert(!isa<ObjCForCollectionStmt>(Condition));
2823+
// TODO: instead of this shortcut perhaps it would be better to "rejoin"
2824+
// the common execution path with
2825+
// StTrue = StFalse = PrevState;
28112826
builder.generateNode(PrevState, true, PredN);
28122827
builder.generateNode(PrevState, false, PredN);
28132828
continue;
28142829
}
28152830
if (StTrue && StFalse)
28162831
assert(!isa<ObjCForCollectionStmt>(Condition));
28172832

2833+
const Expr *EagerlyAssumeExpr =
2834+
PrevState->get<LastEagerlyAssumeAssumptionAt>();
2835+
const Expr *ConditionExpr = dyn_cast<Expr>(Condition);
2836+
if (ConditionExpr)
2837+
ConditionExpr = ConditionExpr->IgnoreParenCasts();
2838+
bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
2839+
bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
2840+
builder.isFeasible(true) && builder.isFeasible(false);
2841+
28182842
// Process the true branch.
28192843
if (builder.isFeasible(true)) {
2820-
if (StTrue)
2844+
if (StTrue) {
2845+
if (BothFeasible && IterationsFinishedInLoop &&
2846+
*IterationsFinishedInLoop >= 2) {
2847+
// When programmers write a loop, they imply that at least two
2848+
// iterations are possible (otherwise they would just write an `if`),
2849+
// but the third iteration is not implied: there are situations where
2850+
// the programmer knows that there won't be a third iteration (e.g.
2851+
// they iterate over a structure that has <= 2 elements) but this is
2852+
// not marked in the source code.
2853+
// Checkers may use this heuristic mark to discard results found on
2854+
// branches that contain this "weak" assumption.
2855+
StTrue = recordWeakLoopAssumption(StTrue);
2856+
}
28212857
builder.generateNode(StTrue, true, PredN);
2822-
else
2858+
} else {
28232859
builder.markInfeasible(true);
2860+
}
28242861
}
28252862

28262863
// Process the false branch.
28272864
if (builder.isFeasible(false)) {
2828-
if (StFalse)
2865+
if (StFalse) {
2866+
if (BothFeasible && IterationsFinishedInLoop &&
2867+
*IterationsFinishedInLoop == 0) {
2868+
// There are many situations where the programmers know that there
2869+
// will be at least one iteration in a loop (e.g. a structure is not
2870+
// empty) but the analyzer cannot deduce this and reports false
2871+
// positives after skipping the loop.
2872+
// Checkers may use this heuristic mark to discard results found on
2873+
// branches that contain this "weak" assumption.
2874+
StFalse = recordWeakLoopAssumption(StFalse);
2875+
}
28292876
builder.generateNode(StFalse, false, PredN);
2830-
else
2877+
} else {
28312878
builder.markInfeasible(false);
2879+
}
28322880
}
28332881
}
28342882
currBldrCtx = nullptr;
@@ -3752,9 +3800,9 @@ ExprEngine::geteagerlyAssumeBinOpBifurcationTags() {
37523800
&eagerlyAssumeBinOpBifurcationFalse);
37533801
}
37543802

3755-
void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
3756-
ExplodedNodeSet &Src,
3757-
const Expr *Ex) {
3803+
void ExprEngine::evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
3804+
ExplodedNodeSet &Src,
3805+
const Expr *Ex) {
37583806
StmtNodeBuilder Bldr(Src, Dst, *currBldrCtx);
37593807

37603808
for (const auto Pred : Src) {
@@ -3776,6 +3824,11 @@ void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
37763824
ProgramStateRef StateTrue, StateFalse;
37773825
std::tie(StateTrue, StateFalse) = state->assume(*SEV);
37783826

3827+
if (StateTrue && StateFalse) {
3828+
StateTrue = StateTrue->set<LastEagerlyAssumeAssumptionAt>(Ex);
3829+
StateFalse = StateFalse->set<LastEagerlyAssumeAssumptionAt>(Ex);
3830+
}
3831+
37793832
// First assume that the condition is true.
37803833
if (StateTrue) {
37813834
SVal Val = svalBuilder.makeIntVal(1U, Ex->getType());

clang/test/Analysis/loop-unrolling.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -349,7 +349,7 @@ int simple_unknown_bound_loop() {
349349
#ifdef DFS
350350
clang_analyzer_numTimesReached(); // expected-warning {{16}}
351351
#else
352-
clang_analyzer_numTimesReached(); // expected-warning {{8}}
352+
clang_analyzer_numTimesReached(); // expected-warning {{10}}
353353
#endif
354354
}
355355
return 0;
@@ -369,9 +369,9 @@ int nested_inlined_no_unroll1() {
369369
int k;
370370
for (int i = 0; i < 9; i++) {
371371
#ifdef DFS
372-
clang_analyzer_numTimesReached(); // expected-warning {{18}}
372+
clang_analyzer_numTimesReached(); // expected-warning {{20}}
373373
#else
374-
clang_analyzer_numTimesReached(); // expected-warning {{14}}
374+
clang_analyzer_numTimesReached(); // expected-warning {{18}}
375375
#endif
376376
k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
377377
}

clang/test/Analysis/out-of-bounds.c

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -verify %s
2+
// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -analyzer-config eagerly-assume=false -verify %s
3+
4+
// Note that eagerly-assume=false is tested separately because the
5+
// WeakLoopAssumption suppression heuristic uses different code paths to
6+
// achieve the same result with and without eagerly-assume.
27

38
void clang_analyzer_eval(int);
49

@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete *p) {
194199
return ((char *)p)[-1]; // no-warning
195200
}
196201

202+
// WeakLoopAssumption suppression
203+
///////////////////////////////////////////////////////////////////////
204+
205+
int GlobalArray[100];
206+
int loop_suppress_after_zero_iterations(unsigned len) {
207+
for (unsigned i = 0; i < len; i++)
208+
if (GlobalArray[i] > 0)
209+
return GlobalArray[i];
210+
// Previously this would have produced an overflow warning because splitting
211+
// the state on the loop condition introduced an execution path where the
212+
// analyzer thinks that len == 0.
213+
// There are very many situations where the programmer knows that an argument
214+
// is positive, but this is not indicated in the source code, so we must
215+
// avoid reporting errors (especially out of bounds errors) on these
216+
// branches, because otherwise we'd get prohibitively many false positives.
217+
return GlobalArray[len - 1]; // no-warning
218+
}
219+
220+
void loop_report_in_second_iteration(int len) {
221+
int buf[1] = {0};
222+
for (int i = 0; i < len; i++) {
223+
// When a programmer writes a loop, we may assume that they intended at
224+
// least two iterations.
225+
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
226+
}
227+
}
228+
229+
void loop_suppress_in_third_iteration(int len) {
230+
int buf[2] = {0};
231+
for (int i = 0; i < len; i++) {
232+
// We should suppress array bounds errors on the third and later iterations
233+
// of loops, because sometimes programmers write a loop in sitiuations
234+
// where they know that there will be at most two iterations.
235+
buf[i] = 1; // no-warning
236+
}
237+
}
238+
239+
void loop_suppress_in_third_iteration_cast(int len) {
240+
int buf[2] = {0};
241+
for (int i = 0; (unsigned)(i < len); i++) {
242+
// Check that a (somewhat arbitrary) cast does not hinder the recognition
243+
// of the condition expression.
244+
buf[i] = 1; // no-warning
245+
}
246+
}
247+
248+
void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
249+
int buf[2] = {0};
250+
for (int i = 0; i < len && flag; i++) {
251+
// FIXME: In this case the checker should suppress the warning the same way
252+
// as it's suppressed in loop_suppress_in_third_iteration, but the
253+
// suppression is not activated because the terminator statement associated
254+
// with the loop is just the expression 'flag', while 'i < len' is a
255+
// separate terminator statement that's associated with the
256+
// short-circuiting operator '&&'.
257+
// I have seen a real-world FP that looks like this, but it is much rarer
258+
// than the basic setup.
259+
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
260+
}
261+
}
262+
263+
void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
264+
int buf[2] = {0};
265+
for (int i = 0; flag && i < len; i++) {
266+
// If the two operands of '&&' are flipped, the suppression works.
267+
buf[i] = 1; // no-warning
268+
}
269+
}
270+
271+
int coinflip(void);
272+
int do_while_report_after_one_iteration(void) {
273+
int i = 0;
274+
do {
275+
i++;
276+
} while (coinflip());
277+
// Unlike `loop_suppress_after_zero_iterations`, running just one iteration
278+
// in a do-while is not a corner case that would produce too many false
279+
// positives, so don't suppress bounds errors in these situations.
280+
return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
281+
}
282+
283+
void do_while_report_in_second_iteration(int len) {
284+
int buf[1] = {0};
285+
int i = 0;
286+
do {
287+
buf[i] = 1; // expected-warning{{Out of bound access to memory}}
288+
} while (i++ < len);
289+
}
290+
291+
void do_while_suppress_in_third_iteration(int len) {
292+
int buf[2] = {0};
293+
int i = 0;
294+
do {
295+
buf[i] = 1; // no-warning
296+
} while (i++ < len);
297+
}

0 commit comments

Comments
 (0)