-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[analyzer] Avoid use of CallEvent
s with obsolete state
#160707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The method `ExprEngine::evalCall` handles multiple state transitions and activates various checker callbacks that take a `CallEvent` parameter (among other parameters). Unfortunately some of these callbacks (EvalCall and pointer escape) were called with a `CallEvent` instance whose attached state was obsolete. This commit fixes this inconsistency by attaching the right state to the `CallEvent`s before their state becomes relevant. I found these inconsistencies as I was trying to understand this part of the source code, so I don't know about any concrete bugs that are caused by them -- but they are definitely fishy. I think it would be nice to handle the "has right state" / "has undefined state" distinction statically within the type system (to prevent the reoccurrence of similar inconsistencies), but I opted for just fixing the current issues as a first step.
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Donát Nagy (NagyDonat) ChangesThe method I found these inconsistencies as I was trying to understand this part of the source code, so I don't know about any concrete bugs that are caused by them -- but they are definitely fishy. I think it would be nice to handle the "has right state" / "has undefined state" distinction statically within the type system (to prevent the reoccurrence of similar inconsistencies), but I opted for just fixing the current issues as a first step. Full diff: https://github.com/llvm/llvm-project/pull/160707.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
index 44c6f9f52cca6..6d80518b010dd 100644
--- a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
@@ -731,33 +731,36 @@ void CheckerManager::runCheckersForEvalCall(ExplodedNodeSet &Dst,
ExplodedNodeSet checkDst;
NodeBuilder B(Pred, checkDst, Eng.getBuilderContext());
+ ProgramStateRef State = Pred->getState();
+ CallEventRef<> UpdatedCall = Call.cloneWithState(State);
+
// Check if any of the EvalCall callbacks can evaluate the call.
for (const auto &EvalCallChecker : EvalCallCheckers) {
// TODO: Support the situation when the call doesn't correspond
// to any Expr.
ProgramPoint L = ProgramPoint::getProgramPoint(
- Call.getOriginExpr(), ProgramPoint::PostStmtKind,
+ UpdatedCall->getOriginExpr(), ProgramPoint::PostStmtKind,
Pred->getLocationContext(), EvalCallChecker.Checker);
bool evaluated = false;
{ // CheckerContext generates transitions(populates checkDest) on
// destruction, so introduce the scope to make sure it gets properly
// populated.
CheckerContext C(B, Eng, Pred, L);
- evaluated = EvalCallChecker(Call, C);
+ evaluated = EvalCallChecker(*UpdatedCall, C);
}
#ifndef NDEBUG
if (evaluated && evaluatorChecker) {
- const auto toString = [](const CallEvent &Call) -> std::string {
+ const auto toString = [](CallEventRef<> Call) -> std::string {
std::string Buf;
llvm::raw_string_ostream OS(Buf);
- Call.dump(OS);
+ Call->dump(OS);
return Buf;
};
std::string AssertionMessage = llvm::formatv(
"The '{0}' call has been already evaluated by the {1} checker, "
"while the {2} checker also tried to evaluate the same call. At "
"most one checker supposed to evaluate a call.",
- toString(Call), evaluatorChecker,
+ toString(UpdatedCall), evaluatorChecker,
EvalCallChecker.Checker->getDebugTag());
llvm_unreachable(AssertionMessage.c_str());
}
@@ -774,7 +777,7 @@ void CheckerManager::runCheckersForEvalCall(ExplodedNodeSet &Dst,
// If none of the checkers evaluated the call, ask ExprEngine to handle it.
if (!evaluatorChecker) {
NodeBuilder B(Pred, Dst, Eng.getBuilderContext());
- Eng.defaultEvalCall(B, Pred, Call, CallOpts);
+ Eng.defaultEvalCall(B, Pred, *UpdatedCall, CallOpts);
}
}
}
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
index 0c491b8c4ca90..9360b423f1c08 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
@@ -628,6 +628,8 @@ void ExprEngine::VisitCallExpr(const CallExpr *CE, ExplodedNode *Pred,
ProgramStateRef ExprEngine::finishArgumentConstruction(ProgramStateRef State,
const CallEvent &Call) {
+ // WARNING: The state attached to 'Call' may be obsolete, do not call any
+ // methods that rely on it!
const Expr *E = Call.getOriginExpr();
// FIXME: Constructors to placement arguments of operator new
// are not supported yet.
@@ -653,6 +655,8 @@ ProgramStateRef ExprEngine::finishArgumentConstruction(ProgramStateRef State,
void ExprEngine::finishArgumentConstruction(ExplodedNodeSet &Dst,
ExplodedNode *Pred,
const CallEvent &Call) {
+ // WARNING: The state attached to 'Call' may be obsolete, do not call any
+ // methods that rely on it!
ProgramStateRef State = Pred->getState();
ProgramStateRef CleanedState = finishArgumentConstruction(State, Call);
if (CleanedState == State) {
@@ -670,35 +674,40 @@ void ExprEngine::finishArgumentConstruction(ExplodedNodeSet &Dst,
}
void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
- const CallEvent &Call) {
- // WARNING: At this time, the state attached to 'Call' may be older than the
- // state in 'Pred'. This is a minor optimization since CheckerManager will
- // use an updated CallEvent instance when calling checkers, but if 'Call' is
- // ever used directly in this function all callers should be updated to pass
- // the most recent state. (It is probably not worth doing the work here since
- // for some callers this will not be necessary.)
+ const CallEvent &CallTemplate) {
+ // WARNING: As this function performs transitions between several different
+ // states (perhaps in a branching structure) we must be careful to avoid
+ // referencing obsolete or irrelevant states. In particular, 'CallEvent'
+ // instances have an attached state (because this is is convenient within the
+ // checker callbacks) and it is our responsibility to keep these up-to-date.
+ // In fact, the parameter 'CallTemplate' is a "template" because its attached
+ // state may be older than the state of 'Pred' (which will be further
+ // transformed by the transitions within this method).
+ // (Note that 'runCheckersFor*Call' and 'finishArgumentConstruction' are
+ // prepared to take this template and and attach the proper state before
+ // forwarding it to the checkers.)
// Run any pre-call checks using the generic call interface.
ExplodedNodeSet dstPreVisit;
- getCheckerManager().runCheckersForPreCall(dstPreVisit, Pred,
- Call, *this);
+ getCheckerManager().runCheckersForPreCall(dstPreVisit, Pred, CallTemplate,
+ *this);
// Actually evaluate the function call. We try each of the checkers
// to see if the can evaluate the function call, and get a callback at
// defaultEvalCall if all of them fail.
ExplodedNodeSet dstCallEvaluated;
- getCheckerManager().runCheckersForEvalCall(dstCallEvaluated, dstPreVisit,
- Call, *this, EvalCallOptions());
+ getCheckerManager().runCheckersForEvalCall(
+ dstCallEvaluated, dstPreVisit, CallTemplate, *this, EvalCallOptions());
// If there were other constructors called for object-type arguments
// of this call, clean them up.
ExplodedNodeSet dstArgumentCleanup;
for (ExplodedNode *I : dstCallEvaluated)
- finishArgumentConstruction(dstArgumentCleanup, I, Call);
+ finishArgumentConstruction(dstArgumentCleanup, I, CallTemplate);
ExplodedNodeSet dstPostCall;
getCheckerManager().runCheckersForPostCall(dstPostCall, dstArgumentCleanup,
- Call, *this);
+ CallTemplate, *this);
// Escaping symbols conjured during invalidating the regions above.
// Note that, for inlined calls the nodes were put back into the worklist,
@@ -708,12 +717,13 @@ void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
// Run pointerEscape callback with the newly conjured symbols.
SmallVector<std::pair<SVal, SVal>, 8> Escaped;
for (ExplodedNode *I : dstPostCall) {
- NodeBuilder B(I, Dst, *currBldrCtx);
ProgramStateRef State = I->getState();
+ CallEventRef<> Call = CallTemplate.cloneWithState(State);
+ NodeBuilder B(I, Dst, *currBldrCtx);
Escaped.clear();
{
unsigned Arg = -1;
- for (const ParmVarDecl *PVD : Call.parameters()) {
+ for (const ParmVarDecl *PVD : Call->parameters()) {
++Arg;
QualType ParamTy = PVD->getType();
if (ParamTy.isNull() ||
@@ -722,13 +732,13 @@ void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
QualType Pointee = ParamTy->getPointeeType();
if (Pointee.isConstQualified() || Pointee->isVoidType())
continue;
- if (const MemRegion *MR = Call.getArgSVal(Arg).getAsRegion())
+ if (const MemRegion *MR = Call->getArgSVal(Arg).getAsRegion())
Escaped.emplace_back(loc::MemRegionVal(MR), State->getSVal(MR, Pointee));
}
}
State = processPointerEscapedOnBind(State, Escaped, I->getLocationContext(),
- PSK_EscapeOutParameters, &Call);
+ PSK_EscapeOutParameters, &*Call);
if (State == I->getState())
Dst.insert(I);
@@ -1212,48 +1222,47 @@ static bool isTrivialObjectAssignment(const CallEvent &Call) {
}
void ExprEngine::defaultEvalCall(NodeBuilder &Bldr, ExplodedNode *Pred,
- const CallEvent &CallTemplate,
+ const CallEvent &Call,
const EvalCallOptions &CallOpts) {
// Make sure we have the most recent state attached to the call.
ProgramStateRef State = Pred->getState();
- CallEventRef<> Call = CallTemplate.cloneWithState(State);
// Special-case trivial assignment operators.
- if (isTrivialObjectAssignment(*Call)) {
- performTrivialCopy(Bldr, Pred, *Call);
+ if (isTrivialObjectAssignment(Call)) {
+ performTrivialCopy(Bldr, Pred, Call);
return;
}
// Try to inline the call.
// The origin expression here is just used as a kind of checksum;
// this should still be safe even for CallEvents that don't come from exprs.
- const Expr *E = Call->getOriginExpr();
+ const Expr *E = Call.getOriginExpr();
ProgramStateRef InlinedFailedState = getInlineFailedState(State, E);
if (InlinedFailedState) {
// If we already tried once and failed, make sure we don't retry later.
State = InlinedFailedState;
} else {
- RuntimeDefinition RD = Call->getRuntimeDefinition();
- Call->setForeign(RD.isForeign());
+ RuntimeDefinition RD = Call.getRuntimeDefinition();
+ Call.setForeign(RD.isForeign());
const Decl *D = RD.getDecl();
- if (shouldInlineCall(*Call, D, Pred, CallOpts)) {
+ if (shouldInlineCall(Call, D, Pred, CallOpts)) {
if (RD.mayHaveOtherDefinitions()) {
AnalyzerOptions &Options = getAnalysisManager().options;
// Explore with and without inlining the call.
if (Options.getIPAMode() == IPAK_DynamicDispatchBifurcate) {
- BifurcateCall(RD.getDispatchRegion(), *Call, D, Bldr, Pred);
+ BifurcateCall(RD.getDispatchRegion(), Call, D, Bldr, Pred);
return;
}
// Don't inline if we're not in any dynamic dispatch mode.
if (Options.getIPAMode() != IPAK_DynamicDispatch) {
- conservativeEvalCall(*Call, Bldr, Pred, State);
+ conservativeEvalCall(Call, Bldr, Pred, State);
return;
}
}
- ctuBifurcate(*Call, D, Bldr, Pred, State);
+ ctuBifurcate(Call, D, Bldr, Pred, State);
return;
}
}
@@ -1261,10 +1270,10 @@ void ExprEngine::defaultEvalCall(NodeBuilder &Bldr, ExplodedNode *Pred,
// If we can't inline it, clean up the state traits used only if the function
// is inlined.
State = removeStateTraitsUsedForArrayEvaluation(
- State, dyn_cast_or_null<CXXConstructExpr>(E), Call->getLocationContext());
+ State, dyn_cast_or_null<CXXConstructExpr>(E), Call.getLocationContext());
// Also handle the return value and invalidate the regions.
- conservativeEvalCall(*Call, Bldr, Pred, State);
+ conservativeEvalCall(Call, Bldr, Pred, State);
}
void ExprEngine::BifurcateCall(const MemRegion *BifurReg,
|
@llvm/pr-subscribers-clang Author: Donát Nagy (NagyDonat) ChangesThe method I found these inconsistencies as I was trying to understand this part of the source code, so I don't know about any concrete bugs that are caused by them -- but they are definitely fishy. I think it would be nice to handle the "has right state" / "has undefined state" distinction statically within the type system (to prevent the reoccurrence of similar inconsistencies), but I opted for just fixing the current issues as a first step. Full diff: https://github.com/llvm/llvm-project/pull/160707.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
index 44c6f9f52cca6..6d80518b010dd 100644
--- a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
@@ -731,33 +731,36 @@ void CheckerManager::runCheckersForEvalCall(ExplodedNodeSet &Dst,
ExplodedNodeSet checkDst;
NodeBuilder B(Pred, checkDst, Eng.getBuilderContext());
+ ProgramStateRef State = Pred->getState();
+ CallEventRef<> UpdatedCall = Call.cloneWithState(State);
+
// Check if any of the EvalCall callbacks can evaluate the call.
for (const auto &EvalCallChecker : EvalCallCheckers) {
// TODO: Support the situation when the call doesn't correspond
// to any Expr.
ProgramPoint L = ProgramPoint::getProgramPoint(
- Call.getOriginExpr(), ProgramPoint::PostStmtKind,
+ UpdatedCall->getOriginExpr(), ProgramPoint::PostStmtKind,
Pred->getLocationContext(), EvalCallChecker.Checker);
bool evaluated = false;
{ // CheckerContext generates transitions(populates checkDest) on
// destruction, so introduce the scope to make sure it gets properly
// populated.
CheckerContext C(B, Eng, Pred, L);
- evaluated = EvalCallChecker(Call, C);
+ evaluated = EvalCallChecker(*UpdatedCall, C);
}
#ifndef NDEBUG
if (evaluated && evaluatorChecker) {
- const auto toString = [](const CallEvent &Call) -> std::string {
+ const auto toString = [](CallEventRef<> Call) -> std::string {
std::string Buf;
llvm::raw_string_ostream OS(Buf);
- Call.dump(OS);
+ Call->dump(OS);
return Buf;
};
std::string AssertionMessage = llvm::formatv(
"The '{0}' call has been already evaluated by the {1} checker, "
"while the {2} checker also tried to evaluate the same call. At "
"most one checker supposed to evaluate a call.",
- toString(Call), evaluatorChecker,
+ toString(UpdatedCall), evaluatorChecker,
EvalCallChecker.Checker->getDebugTag());
llvm_unreachable(AssertionMessage.c_str());
}
@@ -774,7 +777,7 @@ void CheckerManager::runCheckersForEvalCall(ExplodedNodeSet &Dst,
// If none of the checkers evaluated the call, ask ExprEngine to handle it.
if (!evaluatorChecker) {
NodeBuilder B(Pred, Dst, Eng.getBuilderContext());
- Eng.defaultEvalCall(B, Pred, Call, CallOpts);
+ Eng.defaultEvalCall(B, Pred, *UpdatedCall, CallOpts);
}
}
}
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
index 0c491b8c4ca90..9360b423f1c08 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
@@ -628,6 +628,8 @@ void ExprEngine::VisitCallExpr(const CallExpr *CE, ExplodedNode *Pred,
ProgramStateRef ExprEngine::finishArgumentConstruction(ProgramStateRef State,
const CallEvent &Call) {
+ // WARNING: The state attached to 'Call' may be obsolete, do not call any
+ // methods that rely on it!
const Expr *E = Call.getOriginExpr();
// FIXME: Constructors to placement arguments of operator new
// are not supported yet.
@@ -653,6 +655,8 @@ ProgramStateRef ExprEngine::finishArgumentConstruction(ProgramStateRef State,
void ExprEngine::finishArgumentConstruction(ExplodedNodeSet &Dst,
ExplodedNode *Pred,
const CallEvent &Call) {
+ // WARNING: The state attached to 'Call' may be obsolete, do not call any
+ // methods that rely on it!
ProgramStateRef State = Pred->getState();
ProgramStateRef CleanedState = finishArgumentConstruction(State, Call);
if (CleanedState == State) {
@@ -670,35 +674,40 @@ void ExprEngine::finishArgumentConstruction(ExplodedNodeSet &Dst,
}
void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
- const CallEvent &Call) {
- // WARNING: At this time, the state attached to 'Call' may be older than the
- // state in 'Pred'. This is a minor optimization since CheckerManager will
- // use an updated CallEvent instance when calling checkers, but if 'Call' is
- // ever used directly in this function all callers should be updated to pass
- // the most recent state. (It is probably not worth doing the work here since
- // for some callers this will not be necessary.)
+ const CallEvent &CallTemplate) {
+ // WARNING: As this function performs transitions between several different
+ // states (perhaps in a branching structure) we must be careful to avoid
+ // referencing obsolete or irrelevant states. In particular, 'CallEvent'
+ // instances have an attached state (because this is is convenient within the
+ // checker callbacks) and it is our responsibility to keep these up-to-date.
+ // In fact, the parameter 'CallTemplate' is a "template" because its attached
+ // state may be older than the state of 'Pred' (which will be further
+ // transformed by the transitions within this method).
+ // (Note that 'runCheckersFor*Call' and 'finishArgumentConstruction' are
+ // prepared to take this template and and attach the proper state before
+ // forwarding it to the checkers.)
// Run any pre-call checks using the generic call interface.
ExplodedNodeSet dstPreVisit;
- getCheckerManager().runCheckersForPreCall(dstPreVisit, Pred,
- Call, *this);
+ getCheckerManager().runCheckersForPreCall(dstPreVisit, Pred, CallTemplate,
+ *this);
// Actually evaluate the function call. We try each of the checkers
// to see if the can evaluate the function call, and get a callback at
// defaultEvalCall if all of them fail.
ExplodedNodeSet dstCallEvaluated;
- getCheckerManager().runCheckersForEvalCall(dstCallEvaluated, dstPreVisit,
- Call, *this, EvalCallOptions());
+ getCheckerManager().runCheckersForEvalCall(
+ dstCallEvaluated, dstPreVisit, CallTemplate, *this, EvalCallOptions());
// If there were other constructors called for object-type arguments
// of this call, clean them up.
ExplodedNodeSet dstArgumentCleanup;
for (ExplodedNode *I : dstCallEvaluated)
- finishArgumentConstruction(dstArgumentCleanup, I, Call);
+ finishArgumentConstruction(dstArgumentCleanup, I, CallTemplate);
ExplodedNodeSet dstPostCall;
getCheckerManager().runCheckersForPostCall(dstPostCall, dstArgumentCleanup,
- Call, *this);
+ CallTemplate, *this);
// Escaping symbols conjured during invalidating the regions above.
// Note that, for inlined calls the nodes were put back into the worklist,
@@ -708,12 +717,13 @@ void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
// Run pointerEscape callback with the newly conjured symbols.
SmallVector<std::pair<SVal, SVal>, 8> Escaped;
for (ExplodedNode *I : dstPostCall) {
- NodeBuilder B(I, Dst, *currBldrCtx);
ProgramStateRef State = I->getState();
+ CallEventRef<> Call = CallTemplate.cloneWithState(State);
+ NodeBuilder B(I, Dst, *currBldrCtx);
Escaped.clear();
{
unsigned Arg = -1;
- for (const ParmVarDecl *PVD : Call.parameters()) {
+ for (const ParmVarDecl *PVD : Call->parameters()) {
++Arg;
QualType ParamTy = PVD->getType();
if (ParamTy.isNull() ||
@@ -722,13 +732,13 @@ void ExprEngine::evalCall(ExplodedNodeSet &Dst, ExplodedNode *Pred,
QualType Pointee = ParamTy->getPointeeType();
if (Pointee.isConstQualified() || Pointee->isVoidType())
continue;
- if (const MemRegion *MR = Call.getArgSVal(Arg).getAsRegion())
+ if (const MemRegion *MR = Call->getArgSVal(Arg).getAsRegion())
Escaped.emplace_back(loc::MemRegionVal(MR), State->getSVal(MR, Pointee));
}
}
State = processPointerEscapedOnBind(State, Escaped, I->getLocationContext(),
- PSK_EscapeOutParameters, &Call);
+ PSK_EscapeOutParameters, &*Call);
if (State == I->getState())
Dst.insert(I);
@@ -1212,48 +1222,47 @@ static bool isTrivialObjectAssignment(const CallEvent &Call) {
}
void ExprEngine::defaultEvalCall(NodeBuilder &Bldr, ExplodedNode *Pred,
- const CallEvent &CallTemplate,
+ const CallEvent &Call,
const EvalCallOptions &CallOpts) {
// Make sure we have the most recent state attached to the call.
ProgramStateRef State = Pred->getState();
- CallEventRef<> Call = CallTemplate.cloneWithState(State);
// Special-case trivial assignment operators.
- if (isTrivialObjectAssignment(*Call)) {
- performTrivialCopy(Bldr, Pred, *Call);
+ if (isTrivialObjectAssignment(Call)) {
+ performTrivialCopy(Bldr, Pred, Call);
return;
}
// Try to inline the call.
// The origin expression here is just used as a kind of checksum;
// this should still be safe even for CallEvents that don't come from exprs.
- const Expr *E = Call->getOriginExpr();
+ const Expr *E = Call.getOriginExpr();
ProgramStateRef InlinedFailedState = getInlineFailedState(State, E);
if (InlinedFailedState) {
// If we already tried once and failed, make sure we don't retry later.
State = InlinedFailedState;
} else {
- RuntimeDefinition RD = Call->getRuntimeDefinition();
- Call->setForeign(RD.isForeign());
+ RuntimeDefinition RD = Call.getRuntimeDefinition();
+ Call.setForeign(RD.isForeign());
const Decl *D = RD.getDecl();
- if (shouldInlineCall(*Call, D, Pred, CallOpts)) {
+ if (shouldInlineCall(Call, D, Pred, CallOpts)) {
if (RD.mayHaveOtherDefinitions()) {
AnalyzerOptions &Options = getAnalysisManager().options;
// Explore with and without inlining the call.
if (Options.getIPAMode() == IPAK_DynamicDispatchBifurcate) {
- BifurcateCall(RD.getDispatchRegion(), *Call, D, Bldr, Pred);
+ BifurcateCall(RD.getDispatchRegion(), Call, D, Bldr, Pred);
return;
}
// Don't inline if we're not in any dynamic dispatch mode.
if (Options.getIPAMode() != IPAK_DynamicDispatch) {
- conservativeEvalCall(*Call, Bldr, Pred, State);
+ conservativeEvalCall(Call, Bldr, Pred, State);
return;
}
}
- ctuBifurcate(*Call, D, Bldr, Pred, State);
+ ctuBifurcate(Call, D, Bldr, Pred, State);
return;
}
}
@@ -1261,10 +1270,10 @@ void ExprEngine::defaultEvalCall(NodeBuilder &Bldr, ExplodedNode *Pred,
// If we can't inline it, clean up the state traits used only if the function
// is inlined.
State = removeStateTraitsUsedForArrayEvaluation(
- State, dyn_cast_or_null<CXXConstructExpr>(E), Call->getLocationContext());
+ State, dyn_cast_or_null<CXXConstructExpr>(E), Call.getLocationContext());
// Also handle the return value and invalidate the regions.
- conservativeEvalCall(*Call, Bldr, Pred, State);
+ conservativeEvalCall(Call, Bldr, Pred, State);
}
void ExprEngine::BifurcateCall(const MemRegion *BifurReg,
|
const EvalCallOptions &CallOpts) { | ||
// Make sure we have the most recent state attached to the call. | ||
ProgramStateRef State = Pred->getState(); | ||
CallEventRef<> Call = CallTemplate.cloneWithState(State); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing this cloneWithState
call because this method is only called by CheckerManager::runCheckersForEvalCall
and I had to add a cloneWithState
call within that method.
As I'm digging around the codebase, I see that it's a common (anti)pattern that some function takes both a This seems to be a high caliber footgun which can very easily lead to mistakes where the developer accidentally uses the wrong state (e.g. by a seemingly innocent call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that CallEvents have this trap. I've seen it. I don't think it's too widespread. I think we usually catch the use of stale states of CallEvents during review. It also helps that the State is part of our vocabulary and its passed liberally as a parameter, which sets a good example.
This is a critical piece of code.
Many of the changed lines look aesthetic, aka. refactor.
You mention some bug if I recall my sloppy reading, but I could not spot behavioral changes or at least it didn't stand out to me.
If this is NFC, let's adjust the PR title. If has behavioral change, let's have a test exposing that and split off the refactor parts.
const auto toString = [](CallEventRef<> Call) -> std::string { | ||
std::string Buf; | ||
llvm::raw_string_ostream OS(Buf); | ||
Call.dump(OS); | ||
Call->dump(OS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured this could works. Why did you change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dump probably doesn't change after the clone.
But generally, in the checker loop, the updated call event indicates that the call was evaluated by multiple checkers. Which should be impossible in practice but it looks like we've decided to act gracefully when assertions are turned off.
That said, we aren't doing a great job at that, given that we're using llvm_unreachable()
that translates to pure undefined behavior (aka __builtin_unreachable()
) when assertions are turned off. (See also LLVM_UNREACHABLE_OPTIMIZE. These UBs are sometimes incredibly fun to debug.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this?
It is true that this dump (which IIUC just prints the function name) doesn't rely on the State
within the CallEvent
object, so it would work with the non-updated instance -- but as we needed to create the UpdatedCall
(because we need to pass that to the checkers), I thought that it would be better to consistently use it.
I don't exactly recall why I changed the type of this lambda, I think the compiler complained about my first attempt. I can try to avoid these changes (CallEventRef<>
is a somewhat esoteric smart pointer type, but it should be possible to get a const CallEvent &
from it).
<offtopic>
By the way I feel an urge to get rid of this lambda, because it is much more verbose than e.g.
std::string FunctionName;
Call.dump(llvm:raw_string_ostream(FunctionName));
(I don't see a reason why we would need to declare a variable for the stream instead of just using a temporary object -- but I'd test this before applying it.) However, I acknowledge that this is subjective bikeshedding, and I won't remove the lambda if you prefer to keep it.
</offtopic>
But generally, in the checker loop, the updated call event indicates that the call was evaluated by multiple checkers. Which should be impossible in practice but it looks like we've decided to act gracefully when assertions are turned off.
What do you mean by this? My understanding of this situation is that:
- When assertions are turned off (
#ifdef NDEBUG
block a few lines below the displayed area) webreak
and leave the loop eagerly when one checker evaluated the call. - When assertions are turned on, we always iterate over all the checker callback, and do this formatted error printout if a second checker callback evaluated the call.
That said, we aren't doing a great job at that, given that we're using
llvm_unreachable()
that translates to pure undefined behavior (aka__builtin_unreachable()
) when assertions are turned off.
Good to know in general, but this particular llvm_unreachable()
call is within an #ifndef NDEBUG
block so I would guess that it is completely skipped when assertions are turned off. (Is this correct? What is the relationship between assertions and NDEBUG
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, it's the other way round. We're doing a great job and everything's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit 5b8416f I'm reverting the NFC tweaks that affect this part of the code.
// WARNING: As this function performs transitions between several different | ||
// states (perhaps in a branching structure) we must be careful to avoid | ||
// referencing obsolete or irrelevant states. In particular, 'CallEvent' | ||
// instances have an attached state (because this is is convenient within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1649033
// state may be older than the state of 'Pred' (which will be further | ||
// transformed by the transitions within this method). | ||
// (Note that 'runCheckersFor*Call' and 'finishArgumentConstruction' are | ||
// prepared to take this template and and attach the proper state before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and and
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1649033
Noted. It is true that good enough code review can prevent any bugs, but I still think that we should prefer solutions that automatically prevent the errors.
If the
I had to change lots of lines because I needed to switch between use of
This commit fixes three bugs that are present in the current code: (1) The (2) The use of (3) A bit later (still in the "run PointerEscape for the newly conjured symbols" step) the I think it is definitely a bug if a
I will create a test with a new debug checker which modifies the state in its I don't think that it is possible to meaningfully simplify this commit by splitting off "refactor parts". There are only a few non-essential changes (comment changes, renamed variables) and without them the "core" change would be more difficult to understand. |
For the record, I got bitten by this today and wasted about 5 minutes of my time.
I think the best way to go about this is a unittest with a custom bug report visitor that prints a special trait along the path. Like a logger. And the visitor would print these values alongside the ProgramPoint it is attached to. |
@haoNoQ Do you recall why is a State bundled with a CallEvent? Sometimes that gets outdated as we manipulate the State. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this is some ancient history even by my standards 😅 But, yes, this has bummed me out a few times too.
I think the whole point of the CallEvent
was to provide user-friendly access to path-sensitive aspects of the call such as CallEvent.getArgSVal()
and CallEvent.getReturnValue()
. But yeah, the way these checker callbacks accept the state twice, is absolutely weird. I'm pretty sure there even used to be a place where it wasn't true that CheckerContext.getState() == CallEvent.getState()
. (IIRC I fixed one but there could be more places like that. I didn't have the guts to add that as an assertion at the invocation sites of these callbacks.)
Now, when it comes to the way the state mutates during the callback (such as making sure getReturnValue()
produces the value you've just bound to the call expression), I don't think CallEvent
can reasonably handle it, even with a layer of indirection that keeps the state up to date. Like, state splits are a thing. If your evalCall()
splits the path in two and binds two different return values, which one do you return?
From this perspective, yeah, it may be slightly useful to keep the original pre-call state around. Say, if you're modeling a function void foo(int *arg)
that effectively does something like if (*arg == 0) *arg = 1;
, and your checker callback is sufficiently complex, then at some point you may need to ask "Uhh can you please remind me what the original value was?" - and then CallEvent
comes in helpful. But I don't think this minor benefit outweighs the confusion caused by calling that very specific program state "the state". I think it's much easier to handle that by deliberately remembering the original state in a local variable, and then pass that state around. It's probably not a popular situation in the first place.
(You're probably aware of a significantly more annoying version of the problem: the problem of reading the original value of *arg
from inside checkPostCall
. This comes up in the taint checker a lot. But that's much more expensive to provide uniformly.)
So ultimately you'll need to be careful either way. But there's definitely a few ways we could eliminate the confusion.
First, yeah, like you said, we could have a static or dynamical typestate that indicates whether getReturnValue()
makes sense. (It only ever makes sense in checkPostCall
.)
Then, it might be a good idea to simply make CallEvent.getState()
private. It wasn't our goal to provide convenient access to the entire state. We just wanted a nice getArgSVal()
and such. Maybe let's provide only the actually-useful, non-confusing methods? Or we could rename the method to something more descriptive, like getStateBeforeCall()
and getStateAfterCall()
- and make those available/unavailable depending on the typestate of CallEvent
. If we're so worried about the *arg
situation we may also provide a getSValBeforeCall(const MemRegion *)
.
Alternatively, it may be a good idea to implement these methods in CheckerContext
itself, and then stop showing CallEvent
to the user. So instead of CallEvent.getArgSVal()
you'd need to type CheckerContext.getCallArgSVal()
. Unfortunately this disconnects the user from the entire polymorphism of the CallEvent
class. There's actually a lot of methods that would need to be reimplemented (like AnyCXXConstructorCall.getCXXThisVal()
or `CXXAllocatorCall.getArraySizeVal()) and most of them wouldn't make sense most of the time.
Finally, if you're particularly interested in getReturnValue()
updating in sync, it may be possible to do that if you somehow indicate your intent to never split the state in your callback. And that's valuable on its own because of all the accidental state splits that you can introduce by forgetting to chain your addTransition()
s or generateNonFatalErrorNode()
s. (The latter is particularly brutal because it really doesn't sound like it has anything to do with state splitting. Like, I want to emit two non-fatal errors, it's only natural that I invoke the generation of two non-fatal error nodes. With fatal errors it's actually a perfectly valid thing to do. But if later you decide that the errors aren't all that fatal, all of a sudden your entire checker breaks down. And good luck noticing that during testing.) So if there was a way to indicate the callback's intention explicitly, eg. CheckerContext.setOutgoingNodeLimit(1)
(maybe even set it by default to 1
; most checkers need either that or 2
) then we'd avoid all the subtle bugs of that nature. Then, naturally, we'd also be able to have a CheckerContext.setReturnValue()
method that's available only in evalCall()
and only when the outgoing node limit is 1. And then you could look up that value as much as you want!
(Though, of course, you'd still never want to set the return value more than once.)
const auto toString = [](CallEventRef<> Call) -> std::string { | ||
std::string Buf; | ||
llvm::raw_string_ostream OS(Buf); | ||
Call.dump(OS); | ||
Call->dump(OS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dump probably doesn't change after the clone.
But generally, in the checker loop, the updated call event indicates that the call was evaluated by multiple checkers. Which should be impossible in practice but it looks like we've decided to act gracefully when assertions are turned off.
That said, we aren't doing a great job at that, given that we're using llvm_unreachable()
that translates to pure undefined behavior (aka __builtin_unreachable()
) when assertions are turned off. (See also LLVM_UNREACHABLE_OPTIMIZE. These UBs are sometimes incredibly fun to debug.)
I think the assertion I propose is the easiest way to test this patch. I.e., the fact that the newly added assertion used to crash without the rest of the patch but doesn't crash anymore, is a sufficient defense against regressions in and of itself. And that's very TDD: you first add a new contract to the system, then you change the system to fulfill the contract. Like, even if currently the patch doesn't change the observed behavior. But if it does, that'd also be an easy way to identify some real-world code that misbehaves this way, and then you can easily (But this could get tedious if more sources of broken states are identified this way and you're not in a position to fix them all at once.) |
FWIW |
Thanks for all the valuable feedback!
@steakhal In an ideal world I agree with this, but I fear that both suggestions have difficulties:
Based on this right now I feel that the best approach would be gradually improving the situation by smaller changes.
I don't understand how could a bug reporter visitor help in this situation. The bug (which is fixed by this PR) is that checker callbacks receive
I didn't even think about this usecase of the "archived"
I support this idea -- writing a dynamic check would be trivial, and even a static check is not too difficult.
Making EDIT: The jump-to-references functionality of my editor was insufficient, there are at least five or six references to Unfortunately this is just "sweeping the problems under the rug" and not a complete solution -- even if we cannot directly access the obsolete old state object, the analyzer may still run into logically wrong behavior if methods like
WDYT about a change that keeps the
@haoNoQ Are you speaking about the assertion to validate that
I hope I answered everything where I have a strong opinion -- let me know if I missed something where my opinion would be relevant. |
Yeah my point is that these specific results aren't allowed to become obsolete. The argument value is looked up from the Environment. A value of an expression in Environment shall not change for as long as the expression remains live. (Dang. It's almost like this should be an assertion.) The call expression, together with its arguments, remains live until after I hope that the same is true for all these actually-somewhat-irreplaceable methods provided by
Hmm yes you're right. I was like, So, like, do you know whether this assertion would fail without your patch? If you stuff it into the old code and then analyze a bunch of code, would it crash a lot? |
Maybe it makes sense to only include the Environment with the |
Thanks for clarifying this! I already suspected that something similar could be true and justify the use of old (but not too old)
I examined all code that relies on
I don't have a strong opinion, but my wild guess is that without my patch this assertion would cause a significant amount of crashes (but not extremely many) in the EvalCall and PointerEscape callbacks (if the assert was added so that applies to these callbacks).
I think it is OK to keep the whole state as a private data member as long as it is marked with
I really like this description 😂 🌶️ |
Yes I completely agree 😅 The dream of "hey just give me, like, some value" hasn't quite materialized. It's way more complicated than that.
This is indeed a bit silly but legal. They could probably also do a
Yes, the dynamic type can change over time. It usually acts as a constraint, i.e. "what we know about the runtime type behind the pointer". Our knowledge can improve over time but it can't evolve in contradictory ways. (At least not until the program performs a placement-new or something of that nature.) So this is one of the payoffs for tracking dynamic types. It helps us resolve virtual calls as part of Given that it'd be invoked exactly once and very carefully, the question of multiple out-of-sync states hopefully doesn't come up. Maybe it should be moved out of
This one's a bit worse because ObjC But the rough idea is probably still the same: it should be a one-off thing that lives its own life somewhere in the engine, outside of our little utility class.
Same story as The slightly annoying part with this one is that it also needs to be invoked by some of the
Yeah |
I'll do so if I have time.
Thanks for the information!
I don't have a strong opinion about this, but your suggestions sound vaguely right.
Eww...
I agree that it's reasonable to have this as a virtual method of the
I agree that it would be probably useful to have a "do exactly the same thing as conservative evaluation" function. Based on this discussion, I'm planning the following improvements:
I'm not planning to work on the other topics (runtime definition, Objective-C trickery) because they seem to be more complex and I'm less familiar with those areas. I feel that picking the (above mentioned) low hanging |
I realized that there is an unfortunately incompatibility within my plans:
When
Based on this I would like to merge this PR without adding tests:
@steakhal @haoNoQ What do you think about this? (If you don't accept this PR without tests and can't suggest a simple way for testing it, I'll abandon it, because I already feel that I spent too much time on it.) |
@steakhal With 1649033 and 5b8416f I think I handled all your requests -- with the exception of adding tests, which turned out to be problematic (see my previous comment). The change abccb02 is just drive-by cleanup -- I noticed a typo (missing space) in a comment and I felt that it would be nice to fix it. I'm not strongly attached to it -- I can keep the typo if you would prefer minimizing the footprint of this change. |
By making Also we're not even trying to prevent the What makes sense to test, what we really care about, is that the output of the remaining public accessors that look things up from the state stays in sync. And that's a contract that has been followed from the start, your patch didn't improve it, so it's not exactly in the spirit of TDD. But these tests are at least useful for documenting the contract, even if it's currently impossible to write code that would violate that contract. They'd be able to catch an unexpected future situation in which something really goes horribly wrong, like if the argument values in the Environment actively mutate during the lifetime of the Maybe you could also add a few assertions around the call sites of the Or you could, like, make a SFINAE static assertion to confirm that |
No, this commit is trying to prevent the I am proposing a hybrid approach:
These two commits are mostly redundant with each other (either of them would fix 90% of the problems alone), but I see minor reasons for applying both of them:
@haoNoQ If you suggest that
then that would be equivalent to abandoning this PR and and only merging commit 2. (which would make I can also accept this approach if you would prefer. |
Ohh. Right right right. Misread. My bad. You probably shouldn't write a lot of code that you're about to delete anyway. So in my opinion it doesn't make sense to make a unit test or a debug checker just for this patch. Maybe just assert that the states are the same before you pass it to the checker in evalCall? It's very slightly unobvious there with the whole loop thing and that's where the problem was anyway. Or just leave it as-is. If you won't be able to do the follow-up commit then you can add a clever follow-up test instead. |
Actually maybe just keep the strict bare minimum of information in the call event? Like a If you don't want the data to go out of sync, just normalize your database. Our codebase has plenty of room for magical one-of-a-kind solutions but this one is arguably mundane. The |
I think I will leave it as-is because the initialization of ProgramStateRef State = Pred->getState();
CallEventRef<> UpdatedCall = Call.cloneWithState(State);
// Check if any of the EvalCall callbacks can evaluate the call.
for (const auto &EvalCallChecker : EvalCallCheckers) {
// TODO: Support the situation when the call doesn't correspond
// to any Expr.
ProgramPoint L = ProgramPoint::getProgramPoint(
UpdatedCall->getOriginExpr(), ProgramPoint::PostStmtKind,
Pred->getLocationContext(), EvalCallChecker.Checker);
bool evaluated = false;
{ // CheckerContext generates transitions (populates checkDest) on
// destruction, so introduce the scope to make sure it gets properly
// populated.
CheckerContext C(B, Eng, Pred, L);
evaluated = EvalCallChecker(*UpdatedCall, C);
}
I'm pretty sure that I will do the follow-up commit -- it is probably simpler than designing a testcase.
I would really like to have I strongly support this "keep the bare minimum in the call event" idea as a long-term development plan, but I cannot promise to implement it in the foreseeable future, so I would still like to merge this PR as a temporary solution. (If call events do end up being minimized, then it will be possible to discard the logic tweaked in this PR along with all the other code that updates the states attached to Also note that this development plan is compatible with my other plans (making |
The method
ExprEngine::evalCall
handles multiple state transitions and activates various checker callbacks that take aCallEvent
parameter (among other parameters). Unfortunately some of these callbacks (EvalCall and pointer escape) were called with aCallEvent
instance whose attached state was obsolete. This commit fixes this inconsistency by attaching the right state to theCallEvent
s before their state becomes relevant.I found these inconsistencies as I was trying to understand this part of the source code, so I don't know about any concrete bugs that are caused by them -- but they are definitely fishy.
I think it would be nice to handle the "has right state" / "has undefined state" distinction statically within the type system (to prevent the reoccurrence of similar inconsistencies), but I opted for just fixing the current issues as a first step.