Skip to content

Conversation

NagyDonat
Copy link
Contributor

@NagyDonat NagyDonat commented Sep 15, 2025

The underflow reports of checker security.ArrayBound previously displayed the (negative) byte offset of the accessed location; but those numbers were a bit hard to decipher (especially when the elements were large structs), so this commit replaces the byte offset with an index value (if the offset is an integer multiple of the size of the accessed element).

This change only affects the content of the messages; the checker will find the same issues before and after this commit.

The underflow reports of checker security.ArrayBound already displayed
the (negative) byte offset of the accessed location; but those numbers
were sometimes a bit hard to decipher, so I'm extending the message to
also display this offset as a multiple of the size of the accessed
element.

This logic is currently inactive when the byte offset is not an integer
multiple of the size of the accessed element -- primarily because it
would be a bit cumbersome to report the division and the remainder.

This change only affects the messages; the checker will report the same
issues before and after this commit.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Sep 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Donát Nagy (NagyDonat)

Changes

The underflow reports of checker security.ArrayBound already displayed the (negative) byte offset of the accessed location; but those numbers were sometimes a bit hard to decipher, so I'm extending the message to also display this offset as a multiple of the size of the accessed element.

This logic is currently inactive when the byte offset is not an integer multiple of the size of the accessed element -- primarily because it would be a bit cumbersome to report the division and the remainder.

This change only affects the messages; the checker will report the same issues before and after this commit.


Full diff: https://github.com/llvm/llvm-project/pull/158639.diff

2 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp (+31-17)
  • (modified) clang/test/Analysis/ArrayBound/verbose-tests.c (+33-3)
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
index d35031b5c22df..c3c9eec3ad2fd 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
@@ -389,15 +389,26 @@ static std::optional<int64_t> getConcreteValue(std::optional<NonLoc> SV) {
 }
 
 static Messages getPrecedesMsgs(const MemSpaceRegion *Space,
-                                const SubRegion *Region, NonLoc Offset) {
-  std::string RegName = getRegionName(Space, Region), OffsetStr = "";
+                                const SubRegion *Region, NonLoc Offset,
+                                QualType ElemType, int64_t ElemSize) {
+  std::string RegName = getRegionName(Space, Region);
 
-  if (auto ConcreteOffset = getConcreteValue(Offset))
+  std::string OffsetStr = "", ElemInfoStr = "";
+  if (std::optional<int64_t> ConcreteOffset = getConcreteValue(Offset)) {
     OffsetStr = formatv(" {0}", ConcreteOffset);
+    if (*ConcreteOffset % ElemSize == 0) {
+      int64_t Count = *ConcreteOffset / ElemSize;
+      if (Count != -1)
+        ElemInfoStr =
+            formatv(" = {0} * sizeof({1})", Count, ElemType.getAsString());
+      else
+        ElemInfoStr = formatv(" = -sizeof({0})", ElemType.getAsString());
+    }
+  }
 
-  return {
-      formatv("Out of bound access to memory preceding {0}", RegName),
-      formatv("Access of {0} at negative byte offset{1}", RegName, OffsetStr)};
+  return {formatv("Out of bound access to memory preceding {0}", RegName),
+          formatv("Access of {0} at negative byte offset{1}{2}", RegName,
+                  OffsetStr, ElemInfoStr)};
 }
 
 /// Try to divide `Val1` and `Val2` (in place) by `Divisor` and return true if
@@ -419,20 +430,15 @@ static bool tryDividePair(std::optional<int64_t> &Val1,
   return true;
 }
 
-static Messages getExceedsMsgs(ASTContext &ACtx, const MemSpaceRegion *Space,
+static Messages getExceedsMsgs(const MemSpaceRegion *Space,
                                const SubRegion *Region, NonLoc Offset,
-                               NonLoc Extent, SVal Location,
-                               bool AlsoMentionUnderflow) {
+                               NonLoc Extent, bool AlsoMentionUnderflow,
+                               QualType ElemType, int64_t ElemSize) {
   std::string RegName = getRegionName(Space, Region);
-  const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
-  assert(EReg && "this checker only handles element access");
-  QualType ElemType = EReg->getElementType();
 
   std::optional<int64_t> OffsetN = getConcreteValue(Offset);
   std::optional<int64_t> ExtentN = getConcreteValue(Extent);
 
-  int64_t ElemSize = ACtx.getTypeSizeInChars(ElemType).getQuantity();
-
   bool UseByteOffsets = !tryDividePair(OffsetN, ExtentN, ElemSize);
   const char *OffsetOrIndex = UseByteOffsets ? "byte offset" : "index";
 
@@ -585,6 +591,13 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
   if (!RawOffset)
     return;
 
+  const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
+  assert(EReg && "this checker only handles element access");
+  QualType ElemType = EReg->getElementType();
+
+  int64_t ElemSize =
+      C.getASTContext().getTypeSizeInChars(ElemType).getQuantity();
+
   auto [Reg, ByteOffset] = *RawOffset;
 
   // The state updates will be reported as a single note tag, which will be
@@ -635,7 +648,8 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
       } else {
         if (!WithinLowerBound) {
           // ...and it cannot be valid (>= 0), so report an error.
-          Messages Msgs = getPrecedesMsgs(Space, Reg, ByteOffset);
+          Messages Msgs =
+              getPrecedesMsgs(Space, Reg, ByteOffset, ElemType, ElemSize);
           reportOOB(C, PrecedesLowerBound, Msgs, ByteOffset, std::nullopt);
           return;
         }
@@ -678,8 +692,8 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
         }
 
         Messages Msgs =
-            getExceedsMsgs(C.getASTContext(), Space, Reg, ByteOffset,
-                           *KnownSize, Location, AlsoMentionUnderflow);
+            getExceedsMsgs(Space, Reg, ByteOffset, *KnownSize,
+                           AlsoMentionUnderflow, ElemType, ElemSize);
         reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize);
         return;
       }
diff --git a/clang/test/Analysis/ArrayBound/verbose-tests.c b/clang/test/Analysis/ArrayBound/verbose-tests.c
index 84d238ed1a2a4..9b6e33dce8a60 100644
--- a/clang/test/Analysis/ArrayBound/verbose-tests.c
+++ b/clang/test/Analysis/ArrayBound/verbose-tests.c
@@ -11,7 +11,7 @@ int TenElements[10];
 void arrayUnderflow(void) {
   TenElements[-3] = 5;
   // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
-  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12}}
+  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12 = -3 * sizeof(int)}}
 }
 
 int underflowWithDeref(void) {
@@ -19,9 +19,39 @@ int underflowWithDeref(void) {
   --p;
   return *p;
   // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
-  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4}}
+  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4 = -sizeof(int)}}
 }
 
+char underflowReportedAsChar(void) {
+  // The "= -... * sizeof(type)" part uses the type of the accessed element
+  // (here 'char'), not the type that appears in the declaration of the
+  // original array (which would be 'int').
+  return ((char *)TenElements)[-1];
+  // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -1 = -sizeof(char)}}
+}
+
+struct TwoInts {
+  int a, b;
+};
+
+struct TwoInts underflowReportedAsStruct(void) {
+  // Another case where the accessed type is used for reporting the offset.
+  return *(struct TwoInts*)(TenElements - 4);
+  // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -16 = -2 * sizeof(struct TwoInts)}}
+}
+
+struct TwoInts underflowOnlyByteOffset(void) {
+  // In this case the negative byte offset is not a multiple of the size of the
+  // accessed element, so the part "= -... * sizeof(type)" is omitted at the
+  // end of the message.
+  return *(struct TwoInts*)(TenElements - 3);
+  // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+  // expected-note-re@-2 {{Access of 'TenElements' at negative byte offset -12{{$}}}}
+}
+
+
 int rng(void);
 int getIndex(void) {
   switch (rng()) {
@@ -40,7 +70,7 @@ void gh86959(void) {
   while (rng())
     TenElements[getIndex()] = 10;
   // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
-  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -688}}
+  // expected-note@-2 {{Access of 'TenElements' at negative byte offset -688 = -172 * sizeof(int)}}
 }
 
 int scanf(const char *restrict fmt, ...);

TenElements[getIndex()] = 10;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
// expected-note@-2 {{Access of 'TenElements' at negative byte offset -688}}
// expected-note@-2 {{Access of 'TenElements' at negative byte offset -688 = -172 * sizeof(int)}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its surprising to read this, as -172 * sizeof should result in a huge unsigned value due to wrapping.
Maybe a -172nd 'int' element would read better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll rewrite it somehow.

By the way I really hate that C has "unsigned" numbers (i.e. raw binary values without sign handling logic) which are frequently used as if they were "non-negative" numbers (i.e. proper numbers with normal arithmetic that happen to be non-negative). In particular, these "promote to unsigned" rules are the most problematic footguns in this area – it is completely nonsense that e.g. -1 > FFFFFFFFULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up switching to a different approach that generalizes the function generating the overflow message and uses it to generate the underflow message as well. This ensures that the two kinds of messages are formatted in a consistent style.

This unifies the code of `getPrecedesMsg` and `getExceedsMsg` and avoids
the `-count * sizeof(type)` multiplications that would evaluate to
nonsense due to C type conversion footguns.
Copy link

github-actions bot commented Sep 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

}

static Messages getExceedsMsgs(const MemSpaceRegion *Space,
static Messages getNonTaintMsgs(ASTContext &ACtx, const MemSpaceRegion *Space,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make ACtx const?

Copy link
Contributor Author

@NagyDonat NagyDonat Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done in 5b1646f.

By the way this non-const reference was already present in the codebase (since my older message improvement commit added it IIRC last year); here it only appears in the diff because the first approach for this PR deleted it and then the second approach reverted that change.

@NagyDonat NagyDonat merged commit 5cbaf55 into llvm:main Sep 17, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants