Fix array bound checker false negative #161723

t-rasmud · 2025-10-02T19:22:48Z

This PR removes the check for whether an array offset is tainted before reporting an OOB in the security.ArrayBound checker. The isTainted check was not working as intended and leading to false negatives in the case of conditional checks and loops.

EDIT: The updated patch introduces an Aggressive mode that treats all unknown offsets as tainted. Based on the discussions on this PR, the checker assumes unknown values to be untainted by default and isn't expected to report OOB accesses in those cases.

llvmbot · 2025-10-02T19:23:20Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Rashmi Mudduluru (t-rasmud)

Changes

This PR removes the check for whether an array offset is tainted before reporting an OOB in the security.ArrayBound checker. The isTainted check was not working as intended and leading to false negatives in the case of conditional checks and loops.

Full diff: https://github.com/llvm/llvm-project/pull/161723.diff

2 Files Affected:

(modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp (+10-42)
(added) clang/test/Analysis/ArrayBound/cplusplus-tainted-index.cpp (+52)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
index 6ad5acd4e76f2..05b55da4c0312 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
@@ -478,17 +478,6 @@ static Messages getNonTaintMsgs(const ASTContext &ACtx,
           std::string(Buf)};
 }
 
-static Messages getTaintMsgs(const MemSpaceRegion *Space,
-                             const SubRegion *Region, const char *OffsetName,
-                             bool AlsoMentionUnderflow) {
-  std::string RegName = getRegionName(Space, Region);
-  return {formatv("Potential out of bound access to {0} with tainted {1}",
-                  RegName, OffsetName),
-          formatv("Access of {0} with a tainted {1} that may be {2}too large",
-                  RegName, OffsetName,
-                  AlsoMentionUnderflow ? "negative or " : "")};
-}
-
 const NoteTag *StateUpdateReporter::createNoteTag(CheckerContext &C) const {
   // Don't create a note tag if we didn't assume anything:
   if (!AssumedNonNegative && !AssumedUpperBound)
@@ -686,37 +675,16 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
           C.addTransition(ExceedsUpperBound, SUR.createNoteTag(C));
           return;
         }
-
-        BadOffsetKind Problem = AlsoMentionUnderflow
-                                    ? BadOffsetKind::Indeterminate
-                                    : BadOffsetKind::Overflowing;
-        Messages Msgs =
-            getNonTaintMsgs(C.getASTContext(), Space, Reg, ByteOffset,
-                            *KnownSize, Location, Problem);
-        reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize);
-        return;
-      }
-      // ...and it can be valid as well...
-      if (isTainted(State, ByteOffset)) {
-        // ...but it's tainted, so report an error.
-
-        // Diagnostic detail: saying "tainted offset" is always correct, but
-        // the common case is that 'idx' is tainted in 'arr[idx]' and then it's
-        // nicer to say "tainted index".
-        const char *OffsetName = "offset";
-        if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
-          if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
-            OffsetName = "index";
-
-        Messages Msgs =
-            getTaintMsgs(Space, Reg, OffsetName, AlsoMentionUnderflow);
-        reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize,
-                  /*IsTaintBug=*/true);
-        return;
       }
-      // ...and it isn't tainted, so the checker will (optimistically) assume
-      // that the offset is in bounds and mention this in the note tag.
-      SUR.recordUpperBoundAssumption(*KnownSize);
+
+      BadOffsetKind Problem = AlsoMentionUnderflow
+                                  ? BadOffsetKind::Indeterminate
+                                  : BadOffsetKind::Overflowing;
+      Messages Msgs =
+          getNonTaintMsgs(C.getASTContext(), Space, Reg, ByteOffset,
+                          *KnownSize, Location, Problem);
+      reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize);
+      return;
     }
 
     // Actually update the state. The "if" only fails in the extremely unlikely
@@ -758,7 +726,7 @@ void ArrayBoundChecker::reportOOB(CheckerContext &C, ProgramStateRef ErrorState,
                                   std::optional<NonLoc> Extent,
                                   bool IsTaintBug /*=false*/) const {
 
-  ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
+  ExplodedNode *ErrorNode = C.generateNonFatalErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
diff --git a/clang/test/Analysis/ArrayBound/cplusplus-tainted-index.cpp b/clang/test/Analysis/ArrayBound/cplusplus-tainted-index.cpp
new file mode 100644
index 0000000000000..0912f5057f7ed
--- /dev/null
+++ b/clang/test/Analysis/ArrayBound/cplusplus-tainted-index.cpp
@@ -0,0 +1,52 @@
+// RUN: %clang_analyze_cc1 -std=c++11 -Wno-array-bounds -analyzer-config unroll-loops=true -analyzer-checker=unix,core,security.ArrayBound -verify %s
+
+// Test the interactions of `security.ArrayBound` with C++ features.
+
+void test_tainted_index_local() {
+  int arr[10];
+  unsigned index = 10;
+  arr[index] = 7;
+  // expected-warning@-1{{Out of bound access to memory after the end of 'arr'}}
+}
+
+void test_tainted_index_local_range() {
+  int arr[10];
+  for (unsigned index = 0; index < 11; index++)
+    arr[index] = index;
+    // expected-warning@-1{{Out of bound access to memory after the end of 'arr'}}
+}
+
+void test_tainted_index1(unsigned index) {
+  int arr[10];
+  if (index < 12)
+    arr[index] = index;
+  // expected-warning@-1{{Out of bound access to memory after the end of 'arr'}}
+  if (index == 12)
+    arr[index] = index;
+  // expected-warning@-1{{Out of bound access to memory after the end of 'arr'}}
+}
+
+void test_tainted_index2(unsigned index) {
+  int arr[10];
+  if (index < 12)
+    arr[index] = index;
+  // expected-warning@-1{{Out of bound access to memory after the end of 'arr'}}
+}
+
+unsigned strlen(const char *s);
+void strcpy(char *dst, char *src);
+void test_ex(int argc, char *argv[]) {
+  char proc_name[16];
+  unsigned offset = strlen(argv[0]);
+  if (offset == 16) {
+    strcpy(proc_name, argv[0]);
+    proc_name[offset] = 'x';
+    // expected-warning@-1{{Out of bound access to memory after the end of 'proc_name'}}
+  }
+  if (offset <= 16) {
+    strcpy(proc_name, argv[0]);
+    proc_name[offset] = argv[0][16];
+    // expected-warning@-1{{Out of bound access to memory after the end of the region}}
+    // expected-warning@-2{{Out of bound access to memory after the end of 'proc_name'}}
+  }
+}

github-actions · 2025-10-02T19:26:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

t-rasmud · 2025-10-02T20:49:39Z

@llvm-ci test

haoNoQ

Ooo you folks are interested in doing something with this checker too? Nice!!

At a glance it looks like you're trying to turn this checker into an enforcement tool that forces the user to check bounds before every array access. Which is valid but doesn't quite fit with the usual theme of minimizing the noise which is prevalent in our path-sensitive analysis land. So the existing logic actually looks fine to me for what it's supposed to do. These negatives are intentional, they're as true as it gets - for the local, default, highly subjective definition of true/false positives/negatives. I'll explain more in an inline comment.

So you can absolutely do what I think you want to do. But you may need to coexist with the existing logic, like add a checker option to switch it off, or something of that nature.

haoNoQ · 2025-10-02T22:32:16Z

clang/test/Analysis/ArrayBound/assumption-reporting.c

+  // expected-warning@-1 {{Out of bound access to memory around 'TenElements'}}
+  // expected-note@-2 {{Access of 'TenElements' at a negative or overflowing index, while it holds only 10 'int' elements}}


I think this was working as intended. This checker isn't trying to force the user to check the bounds before every array access. Normally it's only telling the user that the OOB access definitely happens. Otherwise, even if the value isn't sufficiently constrained, this doesn't mean that it can actually take all the values it's constrained to. There may be implicit constraints that we aren't aware of.

And then there's the tainted case where we confirm that the value isn't merely unknown, it's actually read from an untrusted source. Like, when we've observed a literal read operation from user input or from network. In this case we're certain that there can be no implicit constraints. The constraints we've recorded so far are all we have. The value can truly be any value within those constraints. Then that's definitely a true positive.

So in this case we can't tell whether this was a false negative or a true negative. The checker is designed to stay silent here.

So if you're looking to catch this case, the right solution is to probably acknowledge the taint source. Eg., you may be interested in an "aggressive" mode that treats all function parameters as taint sources. Or, indeed, treat all unknown values as if they're tainted, which effectively means turning off the taint logic entirely like you did in your patch. But we gotta ask the user to explicitly opt in into such an aggressive mode because most of the in-the-wild users don't usually like this sort of stuff. For them the checker is working as intended.

NagyDonat

Thanks for investing work into this checker, but unfortunatel – as @haoNoQ also explained –, the suggested changes are incompatible with the principle that the analyzer must assume the best about things that it doesn't understand (e.g. if it cannot prove that an index cannot be in bounds, then it must assume that the index is in bounds).

This principle is important because the analyzer only sees and understands small fragments of the code: the simulated execution starts from arbitrarily picked functions (because it cannot reach everything from main()) and it is common that the analyzer doesn't see what happens inside function calls (because the definition is unavailable or the recursion/work limits are exceeded).

With the changes that you propose, the analyzer would flood the user with heaps of false positive warnings, because even if the programmer does perfect bounds checking, there would be many situations where the analyzer doesn't see these bounds checks and (with your aggressive logic) reports an error.

For example in code like

bool index_in_bounds(int, size_t); // defined in different TU
int foo(int idx) {
  if (index_in_bounds(idx, ARRAY_SIZE)) {
    return array[idx];
  }
  return -1;
}

the analyzer with your patch would produce a false positive because (by default) it cannot look at the definition of index_in_bounds and determine that it does proper bounds checking.

The selection of the entrypoint can also be problematic, for example in a situation like

double get_value(int idx) {
  if (idx < 0 || idx > NUM_VALUES)
     return 0.0
  return get_value_impl(idx);
}

//... in a different file

double get_value_impl(int idx) {
  return array[idx];
}

the analyzer can select get_value_impl as an entrypoint -- and with your patch it would be reported as out-of-bounds access even if get_value_impl is only called from get_value (and other functions that do proper bounds checking).

The taint handling is a special case, because the fact that the analyzer sees the taint implies that it was able to follow that value from a taint source (e.g. user input, that can produce arbitrary values) to the array indexing -- and didn't see proper bounds checking along the way. Despite this, the taint checkers are an optin feature of the analyzer because their noisiness may be too much for some users.

I can understand that some users would like to have more aggressive warnings about unchecked array access, and perhaps your changes could be included as an off-by-default option that can be enabled by those users. However, even in this case you should try out your code on several real-world projects (e.g. stable open source code) because I fear that it may produce so many false positives that it would be useless in practice.

…offsets

t-rasmud · 2025-10-07T20:23:52Z

@llvm-ci test

t-rasmud · 2025-10-07T23:00:01Z

Putting a hold on this PR for now. Thanks for your feedback @haoNoQ and @NagyDonat. I agree that the aggressive mode needs aggressive testing with real codebases and we don't have sufficient time at the moment. We plan to revisit this at a later time.

ziqingluo-90 · 2025-10-08T20:09:36Z

clang/test/Analysis/ArrayBound/cplusplus-tainted-index.cpp

+  int arr[10];
+  if (index < 12)
+    arr[index] = index;
+  // expected-warning@-1{{Potential out of bound access to 'arr' with tainted offset}}


It'd be great to add some coverage of good cases. E.g., adding a case for if (0 <= index && index < 10) here.

t-rasmud · 2025-10-21T20:56:14Z

Summary: The bounds checker needs an aggressive mode. This needs to be paired with ways of expressing invariants that allow the programmer to show that a value is in bounds. This could be understanding span, understanding bounds annotations from Firebloom/Evo, or C++ contracts. We don’t plan to do any work on this soon. Closing the PR.

llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 2, 2025

t-rasmud requested a review from NagyDonat October 2, 2025 19:23

t-rasmud requested review from jkorous-apple and ziqingluo-90 October 2, 2025 19:23

t-rasmud marked this pull request as draft October 2, 2025 19:32

t-rasmud added 4 commits October 2, 2025 13:51

Fix false negatives when OOB occurs inside a conditional check

dcae601

test case

cc260f9

unroll loops option to catch OOB in test case

5fe8a24

Fix tests after removing taint assumptions

8163c8e

t-rasmud force-pushed the fix-array-bound-checker-fn branch from d992be5 to 8163c8e Compare October 2, 2025 20:53

t-rasmud marked this pull request as ready for review October 2, 2025 21:34

haoNoQ reviewed Oct 2, 2025

View reviewed changes

NagyDonat requested changes Oct 3, 2025

View reviewed changes

ziqingluo-90 requested a review from steakhal October 3, 2025 18:50

t-rasmud added 3 commits October 7, 2025 12:50

Revert test case changes

acf21ba

Introduce analyzer config option for aggressive reporting of tainted …

cc28200

…offsets

Remove extraneous whitespace

6e261e1

Fix code formatting

daee330

t-rasmud marked this pull request as draft October 7, 2025 23:00

ziqingluo-90 reviewed Oct 8, 2025

View reviewed changes

t-rasmud closed this Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix array bound checker false negative #161723

Fix array bound checker false negative #161723

t-rasmud commented Oct 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

t-rasmud commented Oct 2, 2025

Uh oh!

haoNoQ left a comment

Uh oh!

haoNoQ Oct 2, 2025

Uh oh!

NagyDonat left a comment •

edited

Loading

Uh oh!

t-rasmud commented Oct 7, 2025

Uh oh!

t-rasmud commented Oct 7, 2025

Uh oh!

ziqingluo-90 Oct 8, 2025

Uh oh!

t-rasmud commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		// expected-warning@-1 {{Out of bound access to memory around 'TenElements'}}
		// expected-note@-2 {{Access of 'TenElements' at a negative or overflowing index, while it holds only 10 'int' elements}}

Fix array bound checker false negative #161723

Fix array bound checker false negative #161723

Conversation

t-rasmud commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-rasmud commented Oct 2, 2025

Uh oh!

haoNoQ left a comment

Choose a reason for hiding this comment

Uh oh!

haoNoQ Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-rasmud commented Oct 7, 2025

Uh oh!

t-rasmud commented Oct 7, 2025

Uh oh!

ziqingluo-90 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

t-rasmud commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

t-rasmud commented Oct 2, 2025 •

edited

Loading

llvmbot commented Oct 2, 2025 •

edited

Loading

github-actions bot commented Oct 2, 2025 •

edited

Loading

NagyDonat left a comment •

edited

Loading