-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[analyzer] Show element count in ArrayBound underflow reports #158639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Show element count in ArrayBound underflow reports #158639
Conversation
The underflow reports of checker security.ArrayBound already displayed the (negative) byte offset of the accessed location; but those numbers were sometimes a bit hard to decipher, so I'm extending the message to also display this offset as a multiple of the size of the accessed element. This logic is currently inactive when the byte offset is not an integer multiple of the size of the accessed element -- primarily because it would be a bit cumbersome to report the division and the remainder. This change only affects the messages; the checker will report the same issues before and after this commit.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Donát Nagy (NagyDonat) ChangesThe underflow reports of checker security.ArrayBound already displayed the (negative) byte offset of the accessed location; but those numbers were sometimes a bit hard to decipher, so I'm extending the message to also display this offset as a multiple of the size of the accessed element. This logic is currently inactive when the byte offset is not an integer multiple of the size of the accessed element -- primarily because it would be a bit cumbersome to report the division and the remainder. This change only affects the messages; the checker will report the same issues before and after this commit. Full diff: https://github.com/llvm/llvm-project/pull/158639.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
index d35031b5c22df..c3c9eec3ad2fd 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
@@ -389,15 +389,26 @@ static std::optional<int64_t> getConcreteValue(std::optional<NonLoc> SV) {
}
static Messages getPrecedesMsgs(const MemSpaceRegion *Space,
- const SubRegion *Region, NonLoc Offset) {
- std::string RegName = getRegionName(Space, Region), OffsetStr = "";
+ const SubRegion *Region, NonLoc Offset,
+ QualType ElemType, int64_t ElemSize) {
+ std::string RegName = getRegionName(Space, Region);
- if (auto ConcreteOffset = getConcreteValue(Offset))
+ std::string OffsetStr = "", ElemInfoStr = "";
+ if (std::optional<int64_t> ConcreteOffset = getConcreteValue(Offset)) {
OffsetStr = formatv(" {0}", ConcreteOffset);
+ if (*ConcreteOffset % ElemSize == 0) {
+ int64_t Count = *ConcreteOffset / ElemSize;
+ if (Count != -1)
+ ElemInfoStr =
+ formatv(" = {0} * sizeof({1})", Count, ElemType.getAsString());
+ else
+ ElemInfoStr = formatv(" = -sizeof({0})", ElemType.getAsString());
+ }
+ }
- return {
- formatv("Out of bound access to memory preceding {0}", RegName),
- formatv("Access of {0} at negative byte offset{1}", RegName, OffsetStr)};
+ return {formatv("Out of bound access to memory preceding {0}", RegName),
+ formatv("Access of {0} at negative byte offset{1}{2}", RegName,
+ OffsetStr, ElemInfoStr)};
}
/// Try to divide `Val1` and `Val2` (in place) by `Divisor` and return true if
@@ -419,20 +430,15 @@ static bool tryDividePair(std::optional<int64_t> &Val1,
return true;
}
-static Messages getExceedsMsgs(ASTContext &ACtx, const MemSpaceRegion *Space,
+static Messages getExceedsMsgs(const MemSpaceRegion *Space,
const SubRegion *Region, NonLoc Offset,
- NonLoc Extent, SVal Location,
- bool AlsoMentionUnderflow) {
+ NonLoc Extent, bool AlsoMentionUnderflow,
+ QualType ElemType, int64_t ElemSize) {
std::string RegName = getRegionName(Space, Region);
- const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
- assert(EReg && "this checker only handles element access");
- QualType ElemType = EReg->getElementType();
std::optional<int64_t> OffsetN = getConcreteValue(Offset);
std::optional<int64_t> ExtentN = getConcreteValue(Extent);
- int64_t ElemSize = ACtx.getTypeSizeInChars(ElemType).getQuantity();
-
bool UseByteOffsets = !tryDividePair(OffsetN, ExtentN, ElemSize);
const char *OffsetOrIndex = UseByteOffsets ? "byte offset" : "index";
@@ -585,6 +591,13 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
if (!RawOffset)
return;
+ const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
+ assert(EReg && "this checker only handles element access");
+ QualType ElemType = EReg->getElementType();
+
+ int64_t ElemSize =
+ C.getASTContext().getTypeSizeInChars(ElemType).getQuantity();
+
auto [Reg, ByteOffset] = *RawOffset;
// The state updates will be reported as a single note tag, which will be
@@ -635,7 +648,8 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
} else {
if (!WithinLowerBound) {
// ...and it cannot be valid (>= 0), so report an error.
- Messages Msgs = getPrecedesMsgs(Space, Reg, ByteOffset);
+ Messages Msgs =
+ getPrecedesMsgs(Space, Reg, ByteOffset, ElemType, ElemSize);
reportOOB(C, PrecedesLowerBound, Msgs, ByteOffset, std::nullopt);
return;
}
@@ -678,8 +692,8 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
}
Messages Msgs =
- getExceedsMsgs(C.getASTContext(), Space, Reg, ByteOffset,
- *KnownSize, Location, AlsoMentionUnderflow);
+ getExceedsMsgs(Space, Reg, ByteOffset, *KnownSize,
+ AlsoMentionUnderflow, ElemType, ElemSize);
reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize);
return;
}
diff --git a/clang/test/Analysis/ArrayBound/verbose-tests.c b/clang/test/Analysis/ArrayBound/verbose-tests.c
index 84d238ed1a2a4..9b6e33dce8a60 100644
--- a/clang/test/Analysis/ArrayBound/verbose-tests.c
+++ b/clang/test/Analysis/ArrayBound/verbose-tests.c
@@ -11,7 +11,7 @@ int TenElements[10];
void arrayUnderflow(void) {
TenElements[-3] = 5;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12 = -3 * sizeof(int)}}
}
int underflowWithDeref(void) {
@@ -19,9 +19,39 @@ int underflowWithDeref(void) {
--p;
return *p;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4 = -sizeof(int)}}
}
+char underflowReportedAsChar(void) {
+ // The "= -... * sizeof(type)" part uses the type of the accessed element
+ // (here 'char'), not the type that appears in the declaration of the
+ // original array (which would be 'int').
+ return ((char *)TenElements)[-1];
+ // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset -1 = -sizeof(char)}}
+}
+
+struct TwoInts {
+ int a, b;
+};
+
+struct TwoInts underflowReportedAsStruct(void) {
+ // Another case where the accessed type is used for reporting the offset.
+ return *(struct TwoInts*)(TenElements - 4);
+ // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset -16 = -2 * sizeof(struct TwoInts)}}
+}
+
+struct TwoInts underflowOnlyByteOffset(void) {
+ // In this case the negative byte offset is not a multiple of the size of the
+ // accessed element, so the part "= -... * sizeof(type)" is omitted at the
+ // end of the message.
+ return *(struct TwoInts*)(TenElements - 3);
+ // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+ // expected-note-re@-2 {{Access of 'TenElements' at negative byte offset -12{{$}}}}
+}
+
+
int rng(void);
int getIndex(void) {
switch (rng()) {
@@ -40,7 +70,7 @@ void gh86959(void) {
while (rng())
TenElements[getIndex()] = 10;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -688}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset -688 = -172 * sizeof(int)}}
}
int scanf(const char *restrict fmt, ...);
|
TenElements[getIndex()] = 10; | ||
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}} | ||
// expected-note@-2 {{Access of 'TenElements' at negative byte offset -688}} | ||
// expected-note@-2 {{Access of 'TenElements' at negative byte offset -688 = -172 * sizeof(int)}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its surprising to read this, as -172 * sizeof
should result in a huge unsigned value due to wrapping.
Maybe a -172nd 'int' element
would read better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll rewrite it somehow.
By the way I really hate that C has "unsigned
" numbers (i.e. raw binary values without sign handling logic) which are frequently used as if they were "non-negative" numbers (i.e. proper numbers with normal arithmetic that happen to be non-negative). In particular, these "promote to unsigned" rules are the most problematic footguns in this area – it is completely nonsense that e.g. -1 > FFFFFFFFULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up switching to a different approach that generalizes the function generating the overflow message and uses it to generate the underflow message as well. This ensures that the two kinds of messages are formatted in a consistent style.
This unifies the code of `getPrecedesMsg` and `getExceedsMsg` and avoids the `-count * sizeof(type)` multiplications that would evaluate to nonsense due to C type conversion footguns.
✅ With the latest revision this PR passed the C/C++ code formatter. |
} | ||
|
||
static Messages getExceedsMsgs(const MemSpaceRegion *Space, | ||
static Messages getNonTaintMsgs(ASTContext &ACtx, const MemSpaceRegion *Space, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make ACtx const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done in 5b1646f.
By the way this non-const
reference was already present in the codebase (since my older message improvement commit added it IIRC last year); here it only appears in the diff because the first approach for this PR deleted it and then the second approach reverted that change.
The underflow reports of checker security.ArrayBound previously displayed the (negative) byte offset of the accessed location; but those numbers were a bit hard to decipher (especially when the elements were large structs), so this commit replaces the byte offset with an index value (if the offset is an integer multiple of the size of the accessed element).
This change only affects the content of the messages; the checker will find the same issues before and after this commit.