-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[clang][analyzer] Extend lifetime of dynamic extent information #163562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.
|
@llvm/pr-subscribers-clang Author: Balázs Kéri (balazske) ChangesSymbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists. Full diff: https://github.com/llvm/llvm-project/pull/163562.diff 4 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
index 1a9bef06b15a4..440603fb4d8c7 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
@@ -58,6 +58,8 @@ SVal getDynamicExtentWithOffset(ProgramStateRef State, SVal BufV);
DefinedOrUnknownSVal getDynamicElementCountWithOffset(ProgramStateRef State,
SVal BufV, QualType Ty);
+void markAllDynamicExtentLive(ProgramStateRef State, SymbolReaper &SymReaper);
+
} // namespace ento
} // namespace clang
diff --git a/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp b/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
index 34078dbce0b68..e436b186a2148 100644
--- a/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
@@ -128,5 +128,11 @@ ProgramStateRef setDynamicExtent(ProgramStateRef State, const MemRegion *MR,
return State->set<DynamicExtentMap>(MR->StripCasts(), Size);
}
+void markAllDynamicExtentLive(ProgramStateRef State, SymbolReaper &SymReaper) {
+ for (const auto &I : State->get<DynamicExtentMap>())
+ if (SymbolRef Sym = I.second.getAsSymbol())
+ SymReaper.markLive(Sym);
+}
+
} // namespace ento
} // namespace clang
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 785cdfa15bf04..d9ddc12c54985 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1064,6 +1064,8 @@ void ExprEngine::removeDead(ExplodedNode *Pred, ExplodedNodeSet &Out,
SymReaper.markLive(MR);
}
+ markAllDynamicExtentLive(CleanedState, SymReaper);
+
getCheckerManager().runCheckersForLiveSymbols(CleanedState, SymReaper);
// Create a state in which dead bindings are removed from the environment
diff --git a/clang/test/Analysis/ArrayBound/verbose-tests.c b/clang/test/Analysis/ArrayBound/verbose-tests.c
index e3416886d13e5..9ee290ab6b5b8 100644
--- a/clang/test/Analysis/ArrayBound/verbose-tests.c
+++ b/clang/test/Analysis/ArrayBound/verbose-tests.c
@@ -381,30 +381,12 @@ int *symbolicExtent(int arg) {
return 0;
int *mem = (int*)malloc(arg);
- // TODO: without the following reference to 'arg', the analyzer would discard
- // the range information about (the symbolic value of) 'arg'. This is
- // incorrect because while the variable itself is inaccessible, it becomes
- // the symbolic extent of 'mem', so we still want to reason about its
- // potential values.
- (void)arg;
-
mem[8] = -2;
// expected-warning@-1 {{Out of bound access to memory after the end of the heap area}}
// expected-note@-2 {{Access of 'int' element in the heap area at index 8}}
return mem;
}
-int *symbolicExtentDiscardedRangeInfo(int arg) {
- // This is a copy of the case 'symbolicExtent' without the '(void)arg' hack.
- // TODO: if the analyzer can detect the out-of-bounds access within this
- // testcase, then remove this and the `(void)arg` hack from `symbolicExtent`.
- if (arg >= 5)
- return 0;
- int *mem = (int*)malloc(arg);
- mem[8] = -2;
- return mem;
-}
-
void symbolicIndex(int arg) {
// expected-note@+2 {{Assuming 'arg' is >= 12}}
// expected-note@+1 {{Taking true branch}}
|
|
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Balázs Kéri (balazske) ChangesSymbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists. Full diff: https://github.com/llvm/llvm-project/pull/163562.diff 4 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
index 1a9bef06b15a4..440603fb4d8c7 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h
@@ -58,6 +58,8 @@ SVal getDynamicExtentWithOffset(ProgramStateRef State, SVal BufV);
DefinedOrUnknownSVal getDynamicElementCountWithOffset(ProgramStateRef State,
SVal BufV, QualType Ty);
+void markAllDynamicExtentLive(ProgramStateRef State, SymbolReaper &SymReaper);
+
} // namespace ento
} // namespace clang
diff --git a/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp b/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
index 34078dbce0b68..e436b186a2148 100644
--- a/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/DynamicExtent.cpp
@@ -128,5 +128,11 @@ ProgramStateRef setDynamicExtent(ProgramStateRef State, const MemRegion *MR,
return State->set<DynamicExtentMap>(MR->StripCasts(), Size);
}
+void markAllDynamicExtentLive(ProgramStateRef State, SymbolReaper &SymReaper) {
+ for (const auto &I : State->get<DynamicExtentMap>())
+ if (SymbolRef Sym = I.second.getAsSymbol())
+ SymReaper.markLive(Sym);
+}
+
} // namespace ento
} // namespace clang
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 785cdfa15bf04..d9ddc12c54985 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1064,6 +1064,8 @@ void ExprEngine::removeDead(ExplodedNode *Pred, ExplodedNodeSet &Out,
SymReaper.markLive(MR);
}
+ markAllDynamicExtentLive(CleanedState, SymReaper);
+
getCheckerManager().runCheckersForLiveSymbols(CleanedState, SymReaper);
// Create a state in which dead bindings are removed from the environment
diff --git a/clang/test/Analysis/ArrayBound/verbose-tests.c b/clang/test/Analysis/ArrayBound/verbose-tests.c
index e3416886d13e5..9ee290ab6b5b8 100644
--- a/clang/test/Analysis/ArrayBound/verbose-tests.c
+++ b/clang/test/Analysis/ArrayBound/verbose-tests.c
@@ -381,30 +381,12 @@ int *symbolicExtent(int arg) {
return 0;
int *mem = (int*)malloc(arg);
- // TODO: without the following reference to 'arg', the analyzer would discard
- // the range information about (the symbolic value of) 'arg'. This is
- // incorrect because while the variable itself is inaccessible, it becomes
- // the symbolic extent of 'mem', so we still want to reason about its
- // potential values.
- (void)arg;
-
mem[8] = -2;
// expected-warning@-1 {{Out of bound access to memory after the end of the heap area}}
// expected-note@-2 {{Access of 'int' element in the heap area at index 8}}
return mem;
}
-int *symbolicExtentDiscardedRangeInfo(int arg) {
- // This is a copy of the case 'symbolicExtent' without the '(void)arg' hack.
- // TODO: if the analyzer can detect the out-of-bounds access within this
- // testcase, then remove this and the `(void)arg` hack from `symbolicExtent`.
- if (arg >= 5)
- return 0;
- int *mem = (int*)malloc(arg);
- mem[8] = -2;
- return mem;
-}
-
void symbolicIndex(int arg) {
// expected-note@+2 {{Assuming 'arg' is >= 12}}
// expected-note@+1 {{Taking true branch}}
|
NagyDonat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this issue, I'm very happy to see this patch!
What happens when you have a symbolic region (e.g. allocated by malloc) with a dynamic extent, then both the region and its extent symbol become inaccessible at the same time? Will you prolong the lifetime of the extent symbol? Is it eventually "garbage-collected"?
My first instinct is that instead of markAllDynamicExtentLive you would need a loop that marks the dynamic extent live only if the region itself is live. (Or non-symbolic -- by the way, can non-symbolic regions have a symbolic dynamic extent?)
|
This patch would effectively leak all the extent symbols. There is also quite some history to the liveness subject around symbol extents on Discourse. I wish you could write up what do we currently have and options to fix this for extents (and possibly for any other symbols for that matter because extents are not alone in this issue). |
|
With the current version the extent symbol should become "dead" (not live any more) if the memory region becomes dead. The liveness of the extent symbol does not have effect on the liveness of the memory region. If the leak of the memory region can be detected, the leak of the extent can be detected too (maybe at a later point than before the patch). Normally the extent symbol is not something that can leak so this is anyway a rare case. Purpose of this patch is only to have the information about extent of the memory region available. If the original symbol would be garbage-collected but we keep it live like in this patch, it is not expected that the applied constraints change later. It should work like creating a new symbol for the extent and applying the constraints of the original symbol to it (and not making the original live). |
NagyDonat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I'm satisfied with the logic introduced by this commit.
For extent symbols, this "if the region is alive, keep its extent alive" seems to be the logically correct behavior: this preserves the extent while it's relevant and ensures that the extent is garbage-collected after that point (assuming that the region itself is garbage-collected properly).
I wish you could write up what do we currently have and options to fix this for extents (and possibly for any other symbols for that matter because extents are not alone in this issue).
@steakhal What do you mean by this?
The old discussion at https://discourse.llvm.org/t/keeping-the-extent-symbol-alive/76676/3 seems to be a fairly complete write-up of the current situation (which hadn't changed since then) and the options for fixing the lifetime issues of extent symbols.
I think that lifetime issues of other symbols are mostly irrelevant for this PR: we shouldn't postpone this concrete and clear solution to wait for vague hypothetical improvements in partially related areas.
Also, I strongly suspect that even a highly advanced symbol lifetime system will need the concrete logic for extent symbols (which is added here), so this commit will be a good step toward the complete solution (even if we don't see the complete solution yet).
| return State->set<DynamicExtentMap>(MR->StripCasts(), Size); | ||
| } | ||
|
|
||
| void markAllDynamicExtentLive(ProgramStateRef State, SymbolReaper &SymReaper) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename this to transferLivenessToDynamicExtent or something similar now that we no longer mark all the dynamic extent values as live?
|
@steakhal Gentle ping – what do you think about the state of this PR? May we merge it? |
steakhal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I wish this was done in a checker declared in the DynamicExtent file, but so be it.
| // TODO: without the following reference to 'arg', the analyzer would discard | ||
| // the range information about (the symbolic value of) 'arg'. This is | ||
| // incorrect because while the variable itself is inaccessible, it becomes | ||
| // the symbolic extent of 'mem', so we still want to reason about its | ||
| // potential values. | ||
| (void)arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other similar void casts remaining int the test set that could be cleaned up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find more in the "ArrayBound" tests. Other tests may contain similar casts but this is more difficult to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is why I was curious to see how many of the casts in other test could be dropped after this change. This is not too simple, but with search and replacing with some regex might make it doable if you apply the replacement per file and add the change to the staging area if all the clang tests still pass. (And log what files didn't work out of the box so you could check them manually)
I'm pretty sure Cursor or even some simplified Bash script could automate this for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done of course separately from this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Analysis tests but did not find similar casts (for purpose of keeping constraints of dynamic extent), just one place (that is changed in the last commit).
NagyDonat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the now-irrelevant comment that I marked; after that LGTM.
0041f31 to
d384997
Compare
…#163562) Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.
Symbols used for dynamic extent information of memory regions are now kept as live as long as the memory region exists.