-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[StaticAnalyzer] Fix state update in VisitObjCForCollectionStmt #124477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Ziqing Luo (ziqingluo-90) ChangesIn (rdar://143280254) Full diff: https://github.com/llvm/llvm-project/pull/124477.diff 1 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineObjC.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineObjC.cpp
index f075df3ab5e4d6..9aa5cee71c1374 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineObjC.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineObjC.cpp
@@ -124,24 +124,26 @@ void ExprEngine::VisitObjCForCollectionStmt(const ObjCForCollectionStmt *S,
bool isContainerNull = state->isNull(collectionV).isConstrainedTrue();
- ExplodedNodeSet dstLocation;
- evalLocation(dstLocation, S, elem, Pred, state, elementV, false);
+ ExplodedNodeSet NewPreds; // evalLocation may alter `Pred`
+ evalLocation(NewPreds, S, elem, Pred, state, elementV, false);
- ExplodedNodeSet Tmp;
- StmtNodeBuilder Bldr(Pred, Tmp, *currBldrCtx);
+ for (ExplodedNode *Pred : NewPreds) {
+ ExplodedNodeSet PredSingleton{Pred}, Tmp;
+ StmtNodeBuilder Bldr(Pred, Tmp, *currBldrCtx);
- if (!isContainerNull)
- populateObjCForDestinationSet(dstLocation, svalBuilder, S, elem, elementV,
- SymMgr, currBldrCtx, Bldr,
- /*hasElements=*/true);
+ if (!isContainerNull)
+ populateObjCForDestinationSet(PredSingleton, svalBuilder, S, elem,
+ elementV, SymMgr, currBldrCtx, Bldr,
+ /*hasElements=*/true);
- populateObjCForDestinationSet(dstLocation, svalBuilder, S, elem, elementV,
- SymMgr, currBldrCtx, Bldr,
- /*hasElements=*/false);
+ populateObjCForDestinationSet(PredSingleton, svalBuilder, S, elem, elementV,
+ SymMgr, currBldrCtx, Bldr,
+ /*hasElements=*/false);
- // Finally, run any custom checkers.
- // FIXME: Eventually all pre- and post-checks should live in VisitStmt.
- getCheckerManager().runCheckersForPostStmt(Dst, Tmp, S, *this);
+ // Finally, run any custom checkers.
+ // FIXME: Eventually all pre- and post-checks should live in VisitStmt.
+ getCheckerManager().runCheckersForPostStmt(Dst, Tmp, S, *this);
+ }
}
void ExprEngine::VisitObjCMessage(const ObjCMessageExpr *ME,
|
| bool isContainerNull = state->isNull(collectionV).isConstrainedTrue(); | ||
|
|
||
| ExplodedNodeSet dstLocation; | ||
| evalLocation(dstLocation, S, elem, Pred, state, elementV, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate the issue, suppose evalLocation alters the Pred state and populates the new states in dstLocation. (This name dstLocation is a bit confusing IMO, let's call those new states NewPreds instead.) However, the StmtNodeBuilder below still takes Pred as the source state (old line 131) while the next transitions computed in populateObjCForDestinationSet are computed from NewPreds (old line 134 & 138).
The confusion here is that there suppose to be transitions:
Pred --evalLocation--> NewPreds --populateObjCForDestinationSet--> Dest
but the StmtNodeBuilder used for the second transition above assumes the source state is Pred instead of NewPreds. This is what's wrong with this function VisitObjCForCollectionStmt.
Then how does the bug lead to a crash?
The way StmtNodeBuilder works is that it maintains a Frontier set---the set of states to be explored next. After initialization (old line 131), Frontier = {Pred}. In populateObjCForDestinationSet, StmtNodeBuilder is used to generate Dest and remove source state from the set. Note that NewPreds are passed to populateObjCForDestinationSet as source states. So StmtNodeBuilder attempts to remove NewPreds from the set, which does not change the set. Finally, Frontier = {Pred, Dest}.
Starting from this ill-formed Frontier set, the engine executes hasMoreIterations on them. The assertion fails on the state Pred because it has never been set the ObjCForHasMoreIterations map. Dest has the map set. This is what populateObjCForDestinationSet does. (If assertions are disabled, there is a segmentation fault.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This name
dstLocationis a bit confusing IMO, let's call those new statesNewPredsinstead.)
I would suggest keeping dstLocation (or perhaps changing it to DstLocation to align with the global coding style instead of the local tradition), because it specifies that these are the nodes produced by the "Location" step which is is more informative than NewPreds.
In general, these exploded graph node manipulating functions have a nice pattern for naming the node sets:
void ExprEngine::doSomething(ExplodedNodeSet &Src, ExplodedNodeSet &Dst, ...) {
ExplodedNodeSet DstFoo;
performFoo(Src, DstFoo, ...);
//...
ExplodedNodeSet DstBar;
performBar(DstFoo, DstBar, ...);
//...
performBaz(DstBar, Dst, ...);
}Functions like this take nodes from the node set Src (or instead start from a single node, which is usually called Pred) and place the resulting nodes in the outparameter Dst. If an operation is implemented as the combination of several lower-level steps (Foo, Bar and Baz in the example), then it's natural to introduce e.g. DstFoo as the set which will act as the Dst parameter of performFoo (and later as the Src parameter of performBar).
If there are multiple intermediate node sets, then it's important to give them descriptive names (and not just Pred -> NewPred -> NewNewPred etc.) and as the intermediate set first appears as a destination, it's easier to name it as the "destination of" a certain step. In this function NewPreds could be acceptable (as it is the only intermediate node set), but I would still prefer using the more informative name.
By the way, thanks for the detailed explanation of the issue! Your conclusions and significant changes seem to be correct, so my review is limited to bikeshedding ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Now the name dstLocation makes sense to me.
|
Unfortunately, I cannot forge a minimal example to reproduce this bug due to the aforementioned fact that A minimal example would be just execute an objective-c for loop like this: Then if I would greatly appreciate if anyone can help me create a minimal reproducer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test or is it hard to reproduce?
Edit: OK, I read the comment about the minimal example after I added this comment.
If we cannot find a way to trigger this with the existing checks, you could try to come up with a unit test like this: https://github.com/llvm/llvm-project/blob/main/clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |
steakhal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change affects how an AST node is modeled. Consequently, it's a core change.
As such a change, we should have a test.
Overall, I really liked how thoroughly you described what caused the issue, and that helped to justify the fix. On that front, it looks great.
Thank you for your contribution.
|
|
||
| ExplodedNodeSet Tmp; | ||
| StmtNodeBuilder Bldr(Pred, Tmp, *currBldrCtx); | ||
| for (ExplodedNode *Pred : NewPreds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we wouldn't shadow Pred here.
Maybe if we are iterating a collection called NewPreds, a variable NewPred would be a better, non-shadowing name.
|
CC: @dtarditi |
|
I wanted to push to your branch but for some reason I could not. commit ab9670b613be2bdd802342f031bd5e3d20680925
Author: Balazs Benics <[email protected]>
Date: 2025.01.29 13:02:16
Add a unittest demonstrating that we no longer crash
diff --git a/clang/unittests/StaticAnalyzer/CMakeLists.txt b/clang/unittests/StaticAnalyzer/CMakeLists.txt
index f5da86e54560..7b574bdf7cbc 100644
--- a/clang/unittests/StaticAnalyzer/CMakeLists.txt
+++ b/clang/unittests/StaticAnalyzer/CMakeLists.txt
@@ -15,6 +15,7 @@ add_clang_unittest(StaticAnalysisTests
IsCLibraryFunctionTest.cpp
MemRegionDescriptiveNameTest.cpp
NoStateChangeFuncVisitorTest.cpp
+ ObjCTest.cpp
ParamRegionTest.cpp
RangeSetTest.cpp
RegisterCustomCheckersTest.cpp
diff --git a/clang/unittests/StaticAnalyzer/ObjCTest.cpp b/clang/unittests/StaticAnalyzer/ObjCTest.cpp
new file mode 100644
index 000000000000..16cab5336813
--- /dev/null
+++ b/clang/unittests/StaticAnalyzer/ObjCTest.cpp
@@ -0,0 +1,62 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "CheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
+#include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include "gtest/gtest.h"
+
+using namespace clang;
+using namespace ento;
+
+// Some dummy trait that we can mutate back and forth to force a new State.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(Flag, bool)
+
+namespace {
+class FlipFlagOnCheckLocation : public Checker<check::Location> {
+public:
+ // We make sure we alter the State every time we model a checkLocation event.
+ void checkLocation(SVal l, bool isLoad, const Stmt *S,
+ CheckerContext &C) const {
+ ProgramStateRef State = C.getState();
+ State = State->set<Flag>(!State->get<Flag>());
+ C.addTransition(State);
+ }
+};
+
+void addFlagFlipperChecker(AnalysisASTConsumer &AnalysisConsumer,
+ AnalyzerOptions &AnOpts) {
+ AnOpts.CheckersAndPackages = {{"test.FlipFlagOnCheckLocation", true}};
+ AnalysisConsumer.AddCheckerRegistrationFn([](CheckerRegistry &Registry) {
+ Registry.addChecker<FlipFlagOnCheckLocation>("test.FlipFlagOnCheckLocation",
+ "Description", "");
+ });
+}
+
+TEST(ObjCTest, CheckLocationEventsShouldMaterializeInObjCForCollectionStmts) {
+ // Previously, the `ExprEngine::hasMoreIteration` may fired an assertion
+ // because we forgot to handle correctly the resulting nodes of the
+ // check::Location callback for the ObjCForCollectionStmts.
+ // This caused inconsistencies in the graph and triggering the assertion.
+ std::string Diags;
+ EXPECT_TRUE(runCheckerOnCodeWithArgs<addFlagFlipperChecker>(
+ R"(
+ @class NSArray, NSDictionary, NSString;
+ extern void NSLog(NSString *format, ...) __attribute__((format(__NSString__, 1, 2)));
+ void entrypoint(NSArray *bowl) {
+ for (NSString *fruit in bowl) { // no-crash
+ NSLog(@"Fruit: %@", fruit);
+ }
+ })",
+ {"-x", "objective-c"}, Diags));
+}
+
+} // namespaceFeel free to tweak it, especially the comment as you know the most of this context. |
NagyDonat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for debugging and fixing this issue! Overall I'm satisfied with the change, but I added a few inline comments about secondary issues.
| ExplodedNodeSet PredSingleton{Pred}, Tmp; | ||
| StmtNodeBuilder Bldr(Pred, Tmp, *currBldrCtx); | ||
|
|
||
| if (!isContainerNull) | ||
| populateObjCForDestinationSet(dstLocation, svalBuilder, S, elem, elementV, | ||
| SymMgr, currBldrCtx, Bldr, | ||
| /*hasElements=*/true); | ||
| if (!isContainerNull) | ||
| populateObjCForDestinationSet(PredSingleton, svalBuilder, S, elem, | ||
| elementV, SymMgr, currBldrCtx, Bldr, | ||
| /*hasElements=*/true); | ||
|
|
||
| populateObjCForDestinationSet(dstLocation, svalBuilder, S, elem, elementV, | ||
| SymMgr, currBldrCtx, Bldr, | ||
| /*hasElements=*/false); | ||
| populateObjCForDestinationSet(PredSingleton, svalBuilder, S, elem, elementV, | ||
| SymMgr, currBldrCtx, Bldr, | ||
| /*hasElements=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing PredSingleton, please simplify the definition of populateObjCForDestinationSet. Currently that function iterates over the ExplodedNodes within its first argument, but it is not called anywhere else, so you can simplify it by eliminating the loop and changing its first parameter to a plain ExplodedNode * (and then you can directly pass Pred to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in a separate NFC PR? I'd like to have this PR merged ASAP because it is blocking our CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in a separate NFC PR?
Of course, that's completely fine. Thanks in advance for implementing it!
| bool isContainerNull = state->isNull(collectionV).isConstrainedTrue(); | ||
|
|
||
| ExplodedNodeSet dstLocation; | ||
| evalLocation(dstLocation, S, elem, Pred, state, elementV, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This name
dstLocationis a bit confusing IMO, let's call those new statesNewPredsinstead.)
I would suggest keeping dstLocation (or perhaps changing it to DstLocation to align with the global coding style instead of the local tradition), because it specifies that these are the nodes produced by the "Location" step which is is more informative than NewPreds.
In general, these exploded graph node manipulating functions have a nice pattern for naming the node sets:
void ExprEngine::doSomething(ExplodedNodeSet &Src, ExplodedNodeSet &Dst, ...) {
ExplodedNodeSet DstFoo;
performFoo(Src, DstFoo, ...);
//...
ExplodedNodeSet DstBar;
performBar(DstFoo, DstBar, ...);
//...
performBaz(DstBar, Dst, ...);
}Functions like this take nodes from the node set Src (or instead start from a single node, which is usually called Pred) and place the resulting nodes in the outparameter Dst. If an operation is implemented as the combination of several lower-level steps (Foo, Bar and Baz in the example), then it's natural to introduce e.g. DstFoo as the set which will act as the Dst parameter of performFoo (and later as the Src parameter of performBar).
If there are multiple intermediate node sets, then it's important to give them descriptive names (and not just Pred -> NewPred -> NewNewPred etc.) and as the intermediate set first appears as a destination, it's easier to name it as the "destination of" a certain step. In this function NewPreds could be acceptable (as it is the only intermediate node set), but I would still prefer using the more informative name.
By the way, thanks for the detailed explanation of the issue! Your conclusions and significant changes seem to be correct, so my review is limited to bikeshedding ;)
…Stmt The bug in VisitObjCForCollectionStmt seems have been there for a long time and can be very rarely triggered. The adding test is reduced from a crash observed downstream that reproduces the bug. (llvm#124477) (rdar://143280254)
In `VisitObjCForCollectionStmt`, the function does `evalLocation` for the current element at the original source state `Pred`. The evaluation may result in a new state, say `PredNew`. I.e., there is a transition: `Pred -> PredNew`, though it is a very rare case that `Pred` is NOT identical to `PredNew`. (This explains why the bug exists for many years but no one noticed until recently a crash observed downstream.) Later, the original code does NOT use `PredNew` as the new source state in `StmtNodeBuilder` for next transitions. In cases `Pred != PredNew`, the program ill behaves. (llvm#124477) (rdar://143280254)
afdf144 to
941ea48
Compare
I wish I could see your reply earlier! It took me a while to reduce the downstream reproducer and come up a minimal test. This way of testing looks great. Maybe I should add this test anyway. Thank you steakhal! |
I think I'd still prefer the unittest over a lit test because its way less fragile. |
The bug (see llvm#124477) in VisitObjCForCollectionStmt seems have been there for a long time and can be very rarely triggered. Now adding tests that reproduce it. The integrated test is reduced from a crash observed downstream that reproduces the bug. The unit test is contributed by Balazs Benics (@steakhal) for helping reproducing the bug. Much thanks to their help. Commit this test on behalf of them. (rdar://143280254)
In `VisitObjCForCollectionStmt`, the function does `evalLocation` for the current element at the original source state `Pred`. The evaluation may result in a new state, say `PredNew`. I.e., there is a transition: `Pred -> PredNew`, though it is a very rare case that `Pred` is NOT identical to `PredNew`. (This explains why the bug exists for many years but no one noticed until recently a crash observed downstream.) Later, the original code does NOT use `PredNew` as the new source state in `StmtNodeBuilder` for next transitions. In cases `Pred != PredNew`, the program ill behaves. (llvm#124477) (rdar://143280254)
941ea48 to
2f89672
Compare
The bug (see llvm#124477) in VisitObjCForCollectionStmt seems have been there for a long time and can be very rarely triggered. Now adding tests that reproduce it. The integrated test is reduced from a crash observed downstream that reproduces the bug. The unit test is a more straightforward way of showing how was the bug triggered. (rdar://143280254)
In `VisitObjCForCollectionStmt`, the function does `evalLocation` for the current element at the original source state `Pred`. The evaluation may result in a new state, say `PredNew`. I.e., there is a transition: `Pred -> PredNew`, though it is a very rare case that `Pred` is NOT identical to `PredNew`. (This explains why the bug exists for many years but no one noticed until recently a crash observed downstream.) Later, the original code does NOT use `PredNew` as the new source state in `StmtNodeBuilder` for next transitions. In cases `Pred != PredNew`, the program ill behaves. (llvm#124477) (rdar://143280254)
2f89672 to
730082e
Compare
|
@ziqingluo-90 I noticed you do a lot of force pushes. Remember, at llvm, a PR can only be merged using squash-merge anyways, so your history would vanish regardless. |
In
VisitObjCForCollectionStmt, the function doesevalLocationfor the current element at the original source statePred. The evaluation may result in a new state, sayPredNew. I.e., there is a transition:Pred -> PredNew, though it is a very rare case thatPredis NOT identical toPredNew. (This explains why the bug exists for many years but no one noticed until recently a crash observed downstream.) Later, the original code does NOT usePredNewas the new source state inStmtNodeBuilderfor next transitions. In casesPred != PredNew, the program ill behaves.(rdar://143280254)