Improve the comments

NagyDonat · NagyDonat · commit 8ae4b67a2493 · 2024-09-30T17:11:32.000+02:00
This commit cleans up some typos thet were reported by the reviewers and
tries to provide better explanations for some parts of the patch that
turned out to be confusing.
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,19 +121,28 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
-/// Simple control flow statements like `if` only produce a single state split,
-/// so the fact that they are included in the source code implies that both
-/// branches are possible (at least under some conditions) and the analyzer can
-/// freely assume either of them. (This is not entirely true, because there may
-/// be unmarked logical correlations between `if` statements, but is a good
-/// enough heuristic and the analyzer strongly relies on it.)
-/// On the other hand, in a loop the state may be split repeatedly at each
-/// evaluation of the loop condition, and this can lead to following "weak"
-/// assumptions even though the code does not imply that they're valid and the
-/// programmer intended to cover them.
-/// This function is called to mark the `State` when the engine makes an
-/// assumption which is weak. Checkers may use this heuristical mark to discard
-/// result and reduce the amount of false positives.
+/// Simple control flow statements like `if` can only produce a single two-way
+/// state split, so when the analyzer cannot determine the value of the
+/// condition, it can assume either of the two options, because the fact that
+/// they are in the source code implies that the programmer thought that they
+/// are possible (at least under some conditions).
+/// (Note that this heuristic is not entirely correct when there are _several_
+/// `if` statements with unmarked logical connections between them, but it's
+/// still good enough and the analyzer heavily relies on it.)
+/// In contrast with this, a single loop statement can produce multiple state
+/// splits, and we cannot always single out safe assumptions where we can say
+/// that "the programmer included this loop in the source code, so they clearly
+/// thought that this execution path is possible".
+/// However, the analyzer wants to explore the code in and after the loop, so
+/// it makes assumptions about the loop condition (to get a concrete execution
+/// path) even when they are not justified.
+/// This function is called by the engine to mark the `State` when it makes an
+/// assumption which is "weak". Checkers may use this heuristical mark to
+/// discard the result and reduce the amount of false positives.
+/// TODO: Instead of just marking these branches for checker-specific handling,
+/// we could discard them completely. I suspect that this could eliminate some
+/// false positives without suppressing too many true positives, but I didn't
+/// have time to measure its effects.
 ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
 
 /// Returns true if `recordWeakLoopAssumption()` was called on the execution
@@ -341,9 +350,9 @@ class ExprEngine {
                                ExplodedNode *Pred);
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
-  ///  nodes by processing the 'effects' of a branch condition.
+  /// nodes by processing the 'effects' of a branch condition.
   /// If the branch condition is a loop condition, IterationsFinishedInLoop is
-  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// the number of already finished iterations (0, 1, 2, ...); otherwise it's
   /// std::nullopt.
   void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
                      ExplodedNode *Pred, ExplodedNodeSet &Dst,
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -448,13 +448,12 @@ void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
       Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
   std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
   if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
-    // FIXME: This code approximates the number of finished iteration based on
+    // FIXME: This code approximates the number of finished iterations based on
     // the block count, i.e. the number of evaluations of the terminator block
     // on the current execution path (which includes the current evaluation, so
-    // is always at least 1). This is probably acceptable for the
-    // checker-specific false positive suppression that currently uses this
-    // value, but it would be better to calcuate an accurate count of
-    // iterations.
+    // is always >= 1). This is probably acceptable for the checker-specific
+    // false positive suppression that currently uses this value, but it would
+    // be better to calcuate an accurate count of iterations.
     assert(BlockCount >= 1);
     IterationsFinishedInLoop = BlockCount - 1;
   } else if (isa<DoStmt>(Term)) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -226,9 +226,8 @@ bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
 
 // This trait points to the last expression (logical operator) where an eager
 // assumption introduced a state split (i.e. both cases were feasible). This is
-// used by the WeakLoopAssumption heuristic to find situations where the an
-// eager assumption introduces a state split within the evaluation of a loop
-// condition.
+// used by the WeakLoopAssumption heuristic to find situations where an eager
+// assumption introduces a state split in the evaluation of a loop condition.
 REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
 
 //===----------------------------------------------------------------------===//
@@ -2838,8 +2837,12 @@ void ExprEngine::processBranch(
     const Expr *EagerlyAssumeExpr =
         PrevState->get<LastEagerlyAssumeAssumptionAt>();
     const Expr *ConditionExpr = dyn_cast<Expr>(Condition);
-    if (ConditionExpr)
+    if (ConditionExpr) {
+      // Ignore casts to ensure equivalent behavior with and without
+      // eagerly-assume. This is a mostly theoretical question an I don't see a
+      // good reason for putting casts around a conditional expression.
       ConditionExpr = ConditionExpr->IgnoreParenCasts();
+    }
     bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
     bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
                         builder.isFeasible(true) && builder.isFeasible(false);
@@ -2852,9 +2855,11 @@ void ExprEngine::processBranch(
           // When programmers write a loop, they imply that at least two
           // iterations are possible (otherwise they would just write an `if`),
           // but the third iteration is not implied: there are situations where
-          // the programmer knows that there won't be a third iteration (e.g.
-          // they iterate over a structure that has <= 2 elements) but this is
-          // not marked in the source code.
+          // the programmer knows that there won't be a third iteration, but
+          // this is not marked in the source code. (For example, the ffmpeg
+          // project has 2-element arrays which are accessed from loops where
+          // the number of steps is opaque and the analyzer cannot deduce that
+          // there are <= 2 iterations.)
           // Checkers may use this heuristic mark to discard results found on
           // branches that contain this "weak" assumption.
           StTrue = recordWeakLoopAssumption(StTrue);
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
@@ -210,10 +210,10 @@ int loop_suppress_after_zero_iterations(unsigned len) {
   // Previously this would have produced an overflow warning because splitting
   // the state on the loop condition introduced an execution path where the
   // analyzer thinks that len == 0.
-  // There are very many situations where the programmer knows that an argument
-  // is positive, but this is not indicated in the source code, so we must
-  // avoid reporting errors (especially out of bounds errors) on these
-  // branches, because otherwise we'd get prohibitively many false positives.
+  // There are many situations where the programmer knows that an argument is
+  // positive, but this is not indicated in the source code, so we must avoid
+  // reporting errors (especially out of bounds errors) on these branches,
+  // because otherwise we'd get prohibitively many false positives.
   return GlobalArray[len - 1]; // no-warning
 }
 
@@ -231,7 +231,8 @@ void loop_suppress_in_third_iteration(int len) {
   for (int i = 0; i < len; i++) {
     // We should suppress array bounds errors on the third and later iterations
     // of loops, because sometimes programmers write a loop in sitiuations
-    // where they know that there will be at most two iterations.
+    // where they know that there will be at most two iterations, but the
+    // analyzer cannot deduce this property.
     buf[i] = 1; // no-warning
   }
 }
@@ -263,7 +264,10 @@ void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
 void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
   int buf[2] = {0};
   for (int i = 0; flag && i < len; i++) {
-    // If the two operands of '&&' are flipped, the suppression works.
+    // If the two operands of '&&' are flipped, the suppression works, because
+    // then 'flag' is the terminator statement associated with '&&' (which
+    // determines whether the short-circuiting happens or not) and 'i < len' is
+    // the terminator statement of the loop itself.
     buf[i] = 1; // no-warning
   }
 }