-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[LifetimeSafety] Add missing origins stats for lifetime analysis #166568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-temporal-safety Author: None (DEBADRIBASAK) ChangesThis PR adds the implementation for printing missing origin stats for lifetime analysis. Purpose: This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: Approach:
Example output: For the file llvm-project/llvm/lib/Demangle/Demangle.cpp: Full diff: https://github.com/llvm/llvm-project/pull/166568.diff 6 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index 91ffbb169f947..eb532bc8be3a7 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -23,7 +23,11 @@
#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LiveOrigins.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LoanPropagation.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/AnalysisDeclContext.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
+#include <string>
namespace clang::lifetimes {
@@ -44,10 +48,6 @@ class LifetimeSafetyReporter {
Confidence Confidence) {}
};
-/// The main entry point for the analysis.
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter);
-
namespace internal {
/// An object to hold the factories for immutable collections, ensuring
/// that all created states share the same underlying memory management.
@@ -60,6 +60,7 @@ struct LifetimeFactory {
/// Running the lifetime safety analysis and querying its results. It
/// encapsulates the various dataflow analyses.
class LifetimeSafetyAnalysis {
+
public:
LifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter);
@@ -82,6 +83,12 @@ class LifetimeSafetyAnalysis {
std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
};
} // namespace internal
+
+/// The main entry point for the analysis.
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter);
+
} // namespace clang::lifetimes
#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIMESAFETY_H
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index ba138b078b379..26686a63e9204 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,7 +16,10 @@
#include "clang/AST/Decl.h"
#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
#include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
namespace clang::lifetimes::internal {
@@ -76,6 +79,8 @@ class OriginManager {
void dump(OriginID OID, llvm::raw_ostream &OS) const;
+ const llvm::StringMap<unsigned> getMissingOrigins() const;
+
private:
OriginID getNextOriginID() { return NextOriginID++; }
@@ -85,6 +90,7 @@ class OriginManager {
llvm::SmallVector<Origin> AllOrigins;
llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+ llvm::StringMap<unsigned> ExprTypeToMissingOriginCount;
};
} // namespace clang::lifetimes::internal
diff --git a/clang/include/clang/Sema/AnalysisBasedWarnings.h b/clang/include/clang/Sema/AnalysisBasedWarnings.h
index 4103c3f006a8f..604039ef61cb7 100644
--- a/clang/include/clang/Sema/AnalysisBasedWarnings.h
+++ b/clang/include/clang/Sema/AnalysisBasedWarnings.h
@@ -14,7 +14,10 @@
#define LLVM_CLANG_SEMA_ANALYSISBASEDWARNINGS_H
#include "clang/AST/Decl.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringMap.h"
#include <memory>
namespace clang {
@@ -95,6 +98,9 @@ class AnalysisBasedWarnings {
/// a single function.
unsigned MaxUninitAnalysisBlockVisitsPerFunction;
+ /// Map from expressions missing origin in OriginManager to their counts.
+ llvm::StringMap<unsigned> MissingOriginCount;
+
/// @}
public:
@@ -116,6 +122,9 @@ class AnalysisBasedWarnings {
Policy &getPolicyOverrides() { return PolicyOverrides; }
void PrintStats() const;
+
+ void FindMissingOrigins(AnalysisDeclContext &AC,
+ clang::lifetimes::internal::FactManager &FactMgr);
};
} // namespace sema
diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
index 00c7ed90503e7..d183ce976f946 100644
--- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
@@ -23,9 +23,11 @@
#include "clang/Analysis/AnalysisDeclContext.h"
#include "clang/Analysis/CFG.h"
#include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/StringMap.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/raw_ostream.h"
#include <memory>
namespace clang::lifetimes {
@@ -69,9 +71,12 @@ void LifetimeSafetyAnalysis::run() {
}
} // namespace internal
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter) {
- internal::LifetimeSafetyAnalysis Analysis(AC, Reporter);
- Analysis.run();
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter) {
+ std::unique_ptr<internal::LifetimeSafetyAnalysis> Analysis =
+ std::make_unique<internal::LifetimeSafetyAnalysis>(AC, Reporter);
+ Analysis->run();
+ return Analysis;
}
} // namespace clang::lifetimes
diff --git a/clang/lib/Analysis/LifetimeSafety/Origins.cpp b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
index ea51a75324e06..453abf48261c2 100644
--- a/clang/lib/Analysis/LifetimeSafety/Origins.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
@@ -7,6 +7,9 @@
//===----------------------------------------------------------------------===//
#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
+#include "llvm/ADT/StringMap.h"
namespace clang::lifetimes::internal {
@@ -22,6 +25,10 @@ void OriginManager::dump(OriginID OID, llvm::raw_ostream &OS) const {
OS << ")";
}
+const llvm::StringMap<unsigned> OriginManager::getMissingOrigins() const {
+ return ExprTypeToMissingOriginCount;
+}
+
Origin &OriginManager::addOrigin(OriginID ID, const clang::ValueDecl &D) {
AllOrigins.emplace_back(ID, &D);
return AllOrigins.back();
@@ -37,6 +44,17 @@ OriginID OriginManager::get(const Expr &E) {
auto It = ExprToOriginID.find(&E);
if (It != ExprToOriginID.end())
return It->second;
+
+ // if the expression has no specific origin, increment the missing origin
+ // counter.
+ std::string ExprStr(E.getStmtClassName());
+ ExprStr = ExprStr + "<" + E.getType().getAsString() + ">";
+ auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr);
+ if (CountIt == ExprTypeToMissingOriginCount.end()) {
+ ExprTypeToMissingOriginCount[ExprStr] = 1;
+ } else {
+ CountIt->second++;
+ }
// If the expression itself has no specific origin, and it's a reference
// to a declaration, its origin is that of the declaration it refers to.
// For pointer types, where we don't pre-emptively create an origin for the
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 140b709dbb651..77d2013ff3a93 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -29,7 +29,9 @@
#include "clang/Analysis/Analyses/CFGReachabilityAnalysis.h"
#include "clang/Analysis/Analyses/CalledOnceCheck.h"
#include "clang/Analysis/Analyses/Consumed.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/Analyses/ReachableCode.h"
#include "clang/Analysis/Analyses/ThreadSafety.h"
#include "clang/Analysis/Analyses/UninitializedValues.h"
@@ -52,7 +54,9 @@
#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <deque>
#include <iterator>
@@ -3065,7 +3069,11 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
if (EnableLifetimeSafetyAnalysis && S.getLangOpts().CPlusPlus) {
if (AC.getCFG()) {
lifetimes::LifetimeSafetyReporterImpl LifetimeSafetyReporter(S);
- lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ std::unique_ptr<clang::lifetimes::internal::LifetimeSafetyAnalysis>
+ Analysis =
+ lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ if (S.CollectStats)
+ FindMissingOrigins(AC, Analysis->getFactManager());
}
}
// Check for violations of "called once" parameter properties.
@@ -3131,9 +3139,27 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
}
}
+void clang::sema::AnalysisBasedWarnings::FindMissingOrigins(
+ AnalysisDeclContext &AC, lifetimes::internal::FactManager &FactMgr) {
+ if (AC.getCFG()) {
+ for (const auto &[expr, count] :
+ FactMgr.getOriginMgr().getMissingOrigins()) {
+ MissingOriginCount[expr] += count;
+ }
+ }
+}
+
void clang::sema::AnalysisBasedWarnings::PrintStats() const {
+ llvm::errs() << "\n*** LifetimeSafety Missing Origin Stats "
+ "(expression_type : count) :\n\n";
+ unsigned totalMissingOrigins = 0;
+ for (const auto &[expr, count] : MissingOriginCount) {
+ llvm::errs() << expr << " : " << count << '\n';
+ totalMissingOrigins += count;
+ }
+ llvm::errs() << "Total missing origins: " << totalMissingOrigins << "\n";
+ llvm::errs() << "\n****************************************\n";
llvm::errs() << "\n*** Analysis Based Warnings Stats:\n";
-
unsigned NumCFGsBuilt = NumFunctionsAnalyzed - NumFunctionsWithBadCFGs;
unsigned AvgCFGBlocksPerFunction =
!NumCFGsBuilt ? 0 : NumCFGBlocks/NumCFGsBuilt;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
One major comment is that this implementation exposes many internal:: details of lifetime safety to AnalysisBasedWarnings.h.
Layering-wise I propose the following changes:
- Define
struct LifetimeSafetyStats { llvm::StringMap<unsigned> MissingOriginCount; }inLifetimeSafety.h - Have an instance of
LifetimeSafetyStatsas a field ofAnalysisBasedWarningsinstead ofMissingOriginCount. - Change
runLifetimeSafetyAnalysisto accept this stats instance which it can modify if we have collection turned on:
void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter, LifetimeSafetyStats& Stats, bool CollectStats)- Also add a
void clang::lifetimes::PrintStats(const LifetimeSafetyStats& Stats)in LifetimeSafety.h/.cpp inclang::lifetimesnamespace which prints the stats tolvm::errs(). - Call
clang::lifetimes::PrintStats(Stats)fromAnalysisBasedWarnings::PrintStats().
…and change the signature of runLifetimeSafetyAnalysis to not return the analysis object
usx95
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. This looks much better now.
One remaining concern I have is that we might be missing to record some missingOrigins here. Suggested a more black box collection style using RecursiveASTVisitor. Also it is important to not do this when -print-stats is disabled.
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
… report format. instead of using ExprClassName<QualType> it will be shown in the following way: ``` *** LifetimeSafety Missing Origin Stats (expression_type : count) : UnaryOperator (Type: const value_type) : 1 BinaryOperator (Type: char *) : 3 CXXOperatorCallExpr (Type: const value_type) : 2 CXXMemberCallExpr (Type: basic_string_view<char>) : 1 DeclRefExpr (Type: char *) : 9 CXXMemberCallExpr (Type: const_iterator) : 1 CXXOperatorCallExpr (Type: class std::basic_string<char>) : 1 DeclRefExpr (Type: std::string_view) : 27 CXXOperatorCallExpr (Type: basic_string<char>) : 3 Total missing origins: 48 ```
Xazax-hun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if collecting in buckets indexed by StmtClass<QualType> is the right thing to do.
If the type and the ast node kind are expected to be correlated, i.e., we expect to have holes where a combination of the type and the kind is the root cause something is not covered than this is the right way to slice the data.
On the other hand, if we expect these to be more independent, i.e., some types are not covered and some kinds of ast nodes are not covered and there is no correlation between them, I think it might be better to collect across these two independent axises.
So print one statistics that maps from statement class (ids, no need for strings there) to counts and print another one that maps from types (again no need for string maps) to counts.
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
Xazax-hun: We had added string maps because it is easier to interpret the types/class names as strings. Will not changing it to ids make it difficult to interpret the report? |
Absolutely. We would still print strings, but those strings can be reconstructed from the types or the statement classes when we do the printing. No need to do it eagerly. |
I agree that these are, in principle, two different parts of the problem. I am supportive of having 2 separate metrics one indexed by |
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
Outdated
Show resolved
Hide resolved
| #include "clang/Analysis/Analyses/LifetimeSafety/Facts.h" | ||
| #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h" | ||
| #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h" | ||
| #include "clang/Analysis/AnalysisDeclContext.h" | ||
| #include "clang/Sema/ScopeInfo.h" | ||
| #include "llvm/ADT/DenseMap.h" | ||
| #include "llvm/ADT/MapVector.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert these changes and only add #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h"
Looks like that's the only include related to this change.
…nd putting them in separate maps
|
@Xazax-hun @usx95 I do not think it is feasible to add the
I used two string maps for now: |
This PR adds the implementation for printing missing origin stats for lifetime analysis.
Purpose:
This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form:
StmtClassName<Type> : countApproach:
Example output:
For the file llvm-project/llvm/lib/Demangle/Demangle.cpp: