Skip to content

Conversation

@DEBADRIBASAK
Copy link

This PR adds the implementation for printing missing origin stats for lifetime analysis.

Purpose:

This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: StmtClassName<Type> : count

Approach:

  1. The signature of the runLifetimeAnalysis function is changed to return the LifetimeAnalysis object which will be used to get the origin manager which can be used for finding the count of missing origins.
  2. The count of missing origins is kept in origin manager while the CFG is visited as part of the analysis.

Example output:

For the file llvm-project/llvm/lib/Demangle/Demangle.cpp:

*** LifetimeSafety Missing Origin Stats (expression_type : count) :

DeclRefExpr<std::string_view> : 52
StringLiteral<const char[3]> : 2
DeclRefExpr<char *> : 15
CallExpr<char *> : 4
StringLiteral<const char[2]> : 1
Total missing origins: 74

****************************************

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) labels Nov 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-temporal-safety

Author: None (DEBADRIBASAK)

Changes

This PR adds the implementation for printing missing origin stats for lifetime analysis.

Purpose:

This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: StmtClassName&lt;Type&gt; : count

Approach:

  1. The signature of the runLifetimeAnalysis function is changed to return the LifetimeAnalysis object which will be used to get the origin manager which can be used for finding the count of missing origins.
  2. The count of missing origins is kept in origin manager while the CFG is visited as part of the analysis.

Example output:

For the file llvm-project/llvm/lib/Demangle/Demangle.cpp:

*** LifetimeSafety Missing Origin Stats (expression_type : count) :

DeclRefExpr&lt;std::string_view&gt; : 52
StringLiteral&lt;const char[3]&gt; : 2
DeclRefExpr&lt;char *&gt; : 15
CallExpr&lt;char *&gt; : 4
StringLiteral&lt;const char[2]&gt; : 1
Total missing origins: 74

****************************************

Full diff: https://github.com/llvm/llvm-project/pull/166568.diff

6 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h (+11-4)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h (+6)
  • (modified) clang/include/clang/Sema/AnalysisBasedWarnings.h (+9)
  • (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+9-4)
  • (modified) clang/lib/Analysis/LifetimeSafety/Origins.cpp (+18)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+28-2)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index 91ffbb169f947..eb532bc8be3a7 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -23,7 +23,11 @@
 #include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/LiveOrigins.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/LoanPropagation.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
 #include "clang/Analysis/AnalysisDeclContext.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
+#include <string>
 
 namespace clang::lifetimes {
 
@@ -44,10 +48,6 @@ class LifetimeSafetyReporter {
                                   Confidence Confidence) {}
 };
 
-/// The main entry point for the analysis.
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
-                               LifetimeSafetyReporter *Reporter);
-
 namespace internal {
 /// An object to hold the factories for immutable collections, ensuring
 /// that all created states share the same underlying memory management.
@@ -60,6 +60,7 @@ struct LifetimeFactory {
 /// Running the lifetime safety analysis and querying its results. It
 /// encapsulates the various dataflow analyses.
 class LifetimeSafetyAnalysis {
+
 public:
   LifetimeSafetyAnalysis(AnalysisDeclContext &AC,
                          LifetimeSafetyReporter *Reporter);
@@ -82,6 +83,12 @@ class LifetimeSafetyAnalysis {
   std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
 };
 } // namespace internal
+
+/// The main entry point for the analysis.
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+                          LifetimeSafetyReporter *Reporter);
+
 } // namespace clang::lifetimes
 
 #endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIMESAFETY_H
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index ba138b078b379..26686a63e9204 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,7 +16,10 @@
 
 #include "clang/AST/Decl.h"
 #include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
 
 namespace clang::lifetimes::internal {
 
@@ -76,6 +79,8 @@ class OriginManager {
 
   void dump(OriginID OID, llvm::raw_ostream &OS) const;
 
+  const llvm::StringMap<unsigned> getMissingOrigins() const;
+
 private:
   OriginID getNextOriginID() { return NextOriginID++; }
 
@@ -85,6 +90,7 @@ class OriginManager {
   llvm::SmallVector<Origin> AllOrigins;
   llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
   llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+  llvm::StringMap<unsigned> ExprTypeToMissingOriginCount;
 };
 } // namespace clang::lifetimes::internal
 
diff --git a/clang/include/clang/Sema/AnalysisBasedWarnings.h b/clang/include/clang/Sema/AnalysisBasedWarnings.h
index 4103c3f006a8f..604039ef61cb7 100644
--- a/clang/include/clang/Sema/AnalysisBasedWarnings.h
+++ b/clang/include/clang/Sema/AnalysisBasedWarnings.h
@@ -14,7 +14,10 @@
 #define LLVM_CLANG_SEMA_ANALYSISBASEDWARNINGS_H
 
 #include "clang/AST/Decl.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringMap.h"
 #include <memory>
 
 namespace clang {
@@ -95,6 +98,9 @@ class AnalysisBasedWarnings {
   /// a single function.
   unsigned MaxUninitAnalysisBlockVisitsPerFunction;
 
+  /// Map from expressions missing origin in OriginManager to their counts.
+  llvm::StringMap<unsigned> MissingOriginCount;
+
   /// @}
 
 public:
@@ -116,6 +122,9 @@ class AnalysisBasedWarnings {
   Policy &getPolicyOverrides() { return PolicyOverrides; }
 
   void PrintStats() const;
+
+  void FindMissingOrigins(AnalysisDeclContext &AC,
+                          clang::lifetimes::internal::FactManager &FactMgr);
 };
 
 } // namespace sema
diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
index 00c7ed90503e7..d183ce976f946 100644
--- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
@@ -23,9 +23,11 @@
 #include "clang/Analysis/AnalysisDeclContext.h"
 #include "clang/Analysis/CFG.h"
 #include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/StringMap.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/raw_ostream.h"
 #include <memory>
 
 namespace clang::lifetimes {
@@ -69,9 +71,12 @@ void LifetimeSafetyAnalysis::run() {
 }
 } // namespace internal
 
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
-                               LifetimeSafetyReporter *Reporter) {
-  internal::LifetimeSafetyAnalysis Analysis(AC, Reporter);
-  Analysis.run();
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+                          LifetimeSafetyReporter *Reporter) {
+  std::unique_ptr<internal::LifetimeSafetyAnalysis> Analysis =
+      std::make_unique<internal::LifetimeSafetyAnalysis>(AC, Reporter);
+  Analysis->run();
+  return Analysis;
 }
 } // namespace clang::lifetimes
diff --git a/clang/lib/Analysis/LifetimeSafety/Origins.cpp b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
index ea51a75324e06..453abf48261c2 100644
--- a/clang/lib/Analysis/LifetimeSafety/Origins.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
@@ -7,6 +7,9 @@
 //===----------------------------------------------------------------------===//
 
 #include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
+#include "llvm/ADT/StringMap.h"
 
 namespace clang::lifetimes::internal {
 
@@ -22,6 +25,10 @@ void OriginManager::dump(OriginID OID, llvm::raw_ostream &OS) const {
   OS << ")";
 }
 
+const llvm::StringMap<unsigned> OriginManager::getMissingOrigins() const {
+  return ExprTypeToMissingOriginCount;
+}
+
 Origin &OriginManager::addOrigin(OriginID ID, const clang::ValueDecl &D) {
   AllOrigins.emplace_back(ID, &D);
   return AllOrigins.back();
@@ -37,6 +44,17 @@ OriginID OriginManager::get(const Expr &E) {
   auto It = ExprToOriginID.find(&E);
   if (It != ExprToOriginID.end())
     return It->second;
+
+  // if the expression has no specific origin, increment the missing origin
+  // counter.
+  std::string ExprStr(E.getStmtClassName());
+  ExprStr = ExprStr + "<" + E.getType().getAsString() + ">";
+  auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr);
+  if (CountIt == ExprTypeToMissingOriginCount.end()) {
+    ExprTypeToMissingOriginCount[ExprStr] = 1;
+  } else {
+    CountIt->second++;
+  }
   // If the expression itself has no specific origin, and it's a reference
   // to a declaration, its origin is that of the declaration it refers to.
   // For pointer types, where we don't pre-emptively create an origin for the
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 140b709dbb651..77d2013ff3a93 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -29,7 +29,9 @@
 #include "clang/Analysis/Analyses/CFGReachabilityAnalysis.h"
 #include "clang/Analysis/Analyses/CalledOnceCheck.h"
 #include "clang/Analysis/Analyses/Consumed.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
 #include "clang/Analysis/Analyses/ReachableCode.h"
 #include "clang/Analysis/Analyses/ThreadSafety.h"
 #include "clang/Analysis/Analyses/UninitializedValues.h"
@@ -52,7 +54,9 @@
 #include "llvm/ADT/STLFunctionalExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Casting.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <deque>
 #include <iterator>
@@ -3065,7 +3069,11 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
   if (EnableLifetimeSafetyAnalysis && S.getLangOpts().CPlusPlus) {
     if (AC.getCFG()) {
       lifetimes::LifetimeSafetyReporterImpl LifetimeSafetyReporter(S);
-      lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+      std::unique_ptr<clang::lifetimes::internal::LifetimeSafetyAnalysis>
+          Analysis =
+              lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+      if (S.CollectStats)
+        FindMissingOrigins(AC, Analysis->getFactManager());
     }
   }
   // Check for violations of "called once" parameter properties.
@@ -3131,9 +3139,27 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
   }
 }
 
+void clang::sema::AnalysisBasedWarnings::FindMissingOrigins(
+    AnalysisDeclContext &AC, lifetimes::internal::FactManager &FactMgr) {
+  if (AC.getCFG()) {
+    for (const auto &[expr, count] :
+         FactMgr.getOriginMgr().getMissingOrigins()) {
+      MissingOriginCount[expr] += count;
+    }
+  }
+}
+
 void clang::sema::AnalysisBasedWarnings::PrintStats() const {
+  llvm::errs() << "\n*** LifetimeSafety Missing Origin Stats "
+                  "(expression_type : count) :\n\n";
+  unsigned totalMissingOrigins = 0;
+  for (const auto &[expr, count] : MissingOriginCount) {
+    llvm::errs() << expr << " : " << count << '\n';
+    totalMissingOrigins += count;
+  }
+  llvm::errs() << "Total missing origins: " << totalMissingOrigins << "\n";
+  llvm::errs() << "\n****************************************\n";
   llvm::errs() << "\n*** Analysis Based Warnings Stats:\n";
-
   unsigned NumCFGsBuilt = NumFunctionsAnalyzed - NumFunctionsWithBadCFGs;
   unsigned AvgCFGBlocksPerFunction =
       !NumCFGsBuilt ? 0 : NumCFGBlocks/NumCFGsBuilt;

Copy link
Contributor

@usx95 usx95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

One major comment is that this implementation exposes many internal:: details of lifetime safety to AnalysisBasedWarnings.h.

Layering-wise I propose the following changes:

  • Define struct LifetimeSafetyStats { llvm::StringMap<unsigned> MissingOriginCount; } in LifetimeSafety.h
  • Have an instance of LifetimeSafetyStats as a field of AnalysisBasedWarnings instead of MissingOriginCount.
  • Change runLifetimeSafetyAnalysis to accept this stats instance which it can modify if we have collection turned on:
void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC, 
                               LifetimeSafetyReporter *Reporter, LifetimeSafetyStats& Stats, bool CollectStats)
  • Also add a void clang::lifetimes::PrintStats(const LifetimeSafetyStats& Stats) in LifetimeSafety.h/.cpp in clang::lifetimes namespace which prints the stats to lvm::errs().
  • Call clang::lifetimes::PrintStats(Stats) from AnalysisBasedWarnings::PrintStats().

@usx95 usx95 changed the title Adding implementation for printings missing origin stats for lifetime analysis [LifetimeSafety] Add missing origins stats for lifetime analysis Nov 11, 2025
…and change the signature of runLifetimeSafetyAnalysis to not return the analysis object
@DEBADRIBASAK DEBADRIBASAK requested a review from usx95 November 18, 2025 14:46
Copy link
Contributor

@usx95 usx95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. This looks much better now.

One remaining concern I have is that we might be missing to record some missingOrigins here. Suggested a more black box collection style using RecursiveASTVisitor. Also it is important to not do this when -print-stats​ is disabled.

@DEBADRIBASAK DEBADRIBASAK requested a review from usx95 November 26, 2025 07:24
DEBADRIBASAK and others added 3 commits November 27, 2025 11:45
… report format.

instead of using ExprClassName<QualType> it will be shown in the following way:

```
*** LifetimeSafety Missing Origin Stats (expression_type : count) :

UnaryOperator (Type: const value_type) : 1
BinaryOperator (Type: char *) : 3
CXXOperatorCallExpr (Type: const value_type) : 2
CXXMemberCallExpr (Type: basic_string_view<char>) : 1
DeclRefExpr (Type: char *) : 9
CXXMemberCallExpr (Type: const_iterator) : 1
CXXOperatorCallExpr (Type: class std::basic_string<char>) : 1
DeclRefExpr (Type: std::string_view) : 27
CXXOperatorCallExpr (Type: basic_string<char>) : 3
Total missing origins: 48
```
@DEBADRIBASAK DEBADRIBASAK requested a review from usx95 November 27, 2025 09:23
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if collecting in buckets indexed by StmtClass<QualType> is the right thing to do.

If the type and the ast node kind are expected to be correlated, i.e., we expect to have holes where a combination of the type and the kind is the root cause something is not covered than this is the right way to slice the data.

On the other hand, if we expect these to be more independent, i.e., some types are not covered and some kinds of ast nodes are not covered and there is no correlation between them, I think it might be better to collect across these two independent axises.

So print one statistics that maps from statement class (ids, no need for strings there) to counts and print another one that maps from types (again no need for string maps) to counts.

@DEBADRIBASAK
Copy link
Author

So print one statistics that maps from statement class (ids, no need for strings there) to counts and print another one that maps from types (again no need for string maps) to counts.

Xazax-hun: We had added string maps because it is easier to interpret the types/class names as strings. Will not changing it to ids make it difficult to interpret the report?

@Xazax-hun
Copy link
Collaborator

Will not changing it to ids make it difficult to interpret the report?

Absolutely. We would still print strings, but those strings can be reconstructed from the types or the statement classes when we do the printing. No need to do it eagerly.

Copy link
Contributor

usx95 commented Nov 27, 2025

On the other hand, if we expect these to be more independent, i.e., some types are not covered and some kinds of ast nodes are not covered and there is no correlation between them, I think it might be better to collect across these two independent axises.

I agree that these are, in principle, two different parts of the problem. I am supportive of having 2 separate metrics one indexed by StmtClass and other by QualType.

Comment on lines 17 to 19
#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h"
#include "clang/Analysis/AnalysisDeclContext.h"
#include "clang/Sema/ScopeInfo.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these changes and only add #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeStats.h"

Looks like that's the only include related to this change.

@DEBADRIBASAK
Copy link
Author

@Xazax-hun @usx95 I do not think it is feasible to add the StmtClassName and QualType as keys without converting them to strings.

  1. I tried using a DenseMap with QualType as the keys, but that is not allowed since QualType does not have tombstone.
  2. I was planning to use the StmtClass as an index in an integer array of size Stmt::lastStmtConstant+1, and use StmtClassInfo array of StmtClassNameTable objects for getting their names while printing. But StmtClassInfo is a static array inside Stmt.cpp file and we cannot use it in other files.

I used two string maps for now:

*** LifetimeSafety Missing Origin per QualType: (QualType : count) :

basic_string<char> : 3
basic_string_view<char> : 1
class std::basic_string<char> : 1
char * : 12
const_iterator : 1
const value_type : 3
std::string_view : 27


*** LifetimeSafety Missing Origin per StmtClassName: (StmtClassName : count) :

CXXOperatorCallExpr : 6
BinaryOperator : 3
CXXMemberCallExpr : 2
UnaryOperator : 1
DeclRefExpr : 36

Total missing origins: 48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) clang Clang issues not falling into any other category

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants