Skip to content

Conversation

@usx95
Copy link
Contributor

@usx95 usx95 commented Nov 17, 2025

This PR implements a multi-level origin model for the lifetime safety analysis, replacing the previous single-origin-per-variable approach with an OriginList that captures multiple levels of indirection.

KeyChanges

  1. OriginList Structure: Each origin now represents a single level of indirection, organized into Lists:
    • int* p → length 2 (pointer variable + pointee)
    • int** pp → length 3 (variable + first pointee + second pointee)
    • std::string_view& → length 2 (reference + view object)
  2. Reference Type Understanding: The analysis now properly distinguishes:
    • Pointer types vs references-to-pointer types (string_view vs string_view&)
    • References as aliases: int a; int& b = a; int& c = b; correctly recognizes both b and c alias a
    • References don't add storage and they reuse the underlying declaration's origins.
  3. Type-Driven Origin Creation: Origins are created based on type structure via buildListForType, ensuring origins match the type's indirection levels.
  4. Multi-Level Flow Propagation: The flow function propagates origins through all depths of the lists with a critical assertion:
    assert(Dst->getDepth() == Src->getDepth() && "Lists must have the same length"); This ensures type safety in origin propagation during expression handling. (Rant with relief: This assert​ was quite hard to get right but it helped make the right changes).
  5. LifetimeboundSemantics: For reference return types, lifetimebound​ now propagates only the outermost origin, not inner pointee origins.

We are also deleting many tests in lifetime-safety-dataflow.cpp related to control flow which are better tested in unit tests and the other lit test.

Fixes: #169758
Fixes: #162834
Fixes: #169940

Copy link
Contributor Author

usx95 commented Nov 17, 2025

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@usx95 usx95 changed the title lifetime-safety-multi-origin [LifetimeSafety] Implement multi-level origin tracking for pointer indirection Nov 17, 2025
@usx95 usx95 changed the title [LifetimeSafety] Implement multi-level origin tracking for pointer indirection [LifetimeSafety] Implement multi-level origins Nov 18, 2025
@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch from 9b403b6 to 43233fe Compare November 18, 2025 11:58
@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 112091 tests passed
  • 4568 tests skipped

✅ The build succeeded and all tests passed.

@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch 4 times, most recently from 5b9efab to f914426 Compare November 20, 2025 18:10
@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch 8 times, most recently from 82640a9 to 0ab5258 Compare November 27, 2025 09:12
@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch from 0ab5258 to f622384 Compare November 27, 2025 16:44
@usx95 usx95 marked this pull request as ready for review November 27, 2025 17:31
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) labels Nov 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2025

@llvm/pr-subscribers-clang-temporal-safety

@llvm/pr-subscribers-clang-analysis

Author: Utkarsh Saxena (usx95)

Changes

This PR implements a multi-level origin model for the lifetime safety analysis, replacing the previous single-origin-per-variable approach with an OriginTree structure that captures multiple levels of indirection.

KeyChanges

  1. OriginTree Structure: Each origin now represents a single level of indirection, organized into trees:
    • int* p → depth 2 (pointer variable + pointee)
    • int** pp → depth 3 (variable + first pointee + second pointee)
    • std::string_view& → depth 2 (reference + view object)
  2. Reference Type Understanding: The analysis now properly distinguishes:
    • Pointer types vs references-to-pointer types (string_view vs string_view&)
    • References as aliases: int a; int& b = a; int& c = b; correctly recognizes both b and c alias a
    • References don't add storage and they reuse the underlying declaration's origins.
  3. Type-Driven Origin Creation: Origins are created based on type structure via buildTreeForType, ensuring origins match the type's indirection levels.
  4. Multi-Level Flow Propagation: The flow function propagates origins through all tree levels with a critical assertion:
    This ensures type safety in origin propagation during expression handling. (Rant with relief: This assert​ was quite hard to get right but it helped make the right changes).
  5. Lifetimebound Semantics: For reference return types, lifetimebound now propagates only the outermost origin, not inner pointee origins.

We are also deleting many tests in lifetime-safety-dataflow.cpp related to control flow which are better tested in unit tests and the other lit test.


Patch is 98.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168344.diff

14 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h (+7-6)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h (+7-10)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h (+2-2)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h (+104-24)
  • (modified) clang/lib/Analysis/LifetimeSafety/Facts.cpp (+11-4)
  • (modified) clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp (+231-90)
  • (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+27-7)
  • (modified) clang/lib/Analysis/LifetimeSafety/LiveOrigins.cpp (+13-7)
  • (modified) clang/lib/Analysis/LifetimeSafety/LoanPropagation.cpp (+2-1)
  • (modified) clang/lib/Analysis/LifetimeSafety/Origins.cpp (+113-58)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+3)
  • (modified) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+154-372)
  • (modified) clang/test/Sema/warn-lifetime-safety.cpp (+384-30)
  • (modified) clang/unittests/Analysis/LifetimeSafetyTest.cpp (+55-30)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
index b5f7f8746186a..908d2a5b8cc76 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
@@ -155,7 +155,8 @@ class OriginEscapesFact : public Fact {
 
 class UseFact : public Fact {
   const Expr *UseExpr;
-  OriginID OID;
+  // The origins of the expression being used.
+  llvm::SmallVector<OriginID, 1> OIDs;
   // True if this use is a write operation (e.g., left-hand side of assignment).
   // Write operations are exempted from use-after-free checks.
   bool IsWritten = false;
@@ -163,10 +164,10 @@ class UseFact : public Fact {
 public:
   static bool classof(const Fact *F) { return F->getKind() == Kind::Use; }
 
-  UseFact(const Expr *UseExpr, OriginManager &OM)
-      : Fact(Kind::Use), UseExpr(UseExpr), OID(OM.get(*UseExpr)) {}
+  UseFact(const Expr *UseExpr, llvm::ArrayRef<OriginID> OIDs)
+      : Fact(Kind::Use), UseExpr(UseExpr), OIDs(OIDs.begin(), OIDs.end()) {}
 
-  OriginID getUsedOrigin() const { return OID; }
+  llvm::ArrayRef<OriginID> getUsedOrigins() const { return OIDs; }
   const Expr *getUseExpr() const { return UseExpr; }
   void markAsWritten() { IsWritten = true; }
   bool isWritten() const { return IsWritten; }
@@ -194,8 +195,8 @@ class TestPointFact : public Fact {
 
 class FactManager {
 public:
-  void init(const CFG &Cfg) {
-    assert(BlockToFacts.empty() && "FactManager already initialized");
+  FactManager(const AnalysisDeclContext &AC, const CFG &Cfg)
+      : OriginMgr(AC.getASTContext()) {
     BlockToFacts.resize(Cfg.getNumBlockIDs());
   }
 
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
index 878cb90b685f9..939f421505463 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
@@ -50,6 +50,11 @@ class FactsGenerator : public ConstStmtVisitor<FactsGenerator> {
   void VisitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *MTE);
 
 private:
+  OriginTree *getTree(const ValueDecl &D);
+  OriginTree *getTree(const Expr &E);
+
+  void flow(OriginTree *Dst, OriginTree *Src, bool Kill);
+
   void handleLifetimeEnds(const CFGLifetimeEnds &LifetimeEnds);
 
   void handleGSLPointerConstruction(const CXXConstructExpr *CCE);
@@ -64,26 +69,18 @@ class FactsGenerator : public ConstStmtVisitor<FactsGenerator> {
 
   template <typename Destination, typename Source>
   void flowOrigin(const Destination &D, const Source &S) {
-    OriginID DestOID = FactMgr.getOriginMgr().getOrCreate(D);
-    OriginID SrcOID = FactMgr.getOriginMgr().get(S);
-    CurrentBlockFacts.push_back(FactMgr.createFact<OriginFlowFact>(
-        DestOID, SrcOID, /*KillDest=*/false));
+    flow(getTree(D), getTree(S), /*Kill=*/false);
   }
 
   template <typename Destination, typename Source>
   void killAndFlowOrigin(const Destination &D, const Source &S) {
-    OriginID DestOID = FactMgr.getOriginMgr().getOrCreate(D);
-    OriginID SrcOID = FactMgr.getOriginMgr().get(S);
-    CurrentBlockFacts.push_back(
-        FactMgr.createFact<OriginFlowFact>(DestOID, SrcOID, /*KillDest=*/true));
+    flow(getTree(D), getTree(S), /*Kill=*/true);
   }
 
   /// Checks if the expression is a `void("__lifetime_test_point_...")` cast.
   /// If so, creates a `TestPointFact` and returns true.
   bool handleTestPoint(const CXXFunctionalCastExpr *FCE);
 
-  void handleAssignment(const Expr *LHSExpr, const Expr *RHSExpr);
-
   // A DeclRefExpr will be treated as a use of the referenced decl. It will be
   // checked for use-after-free unless it is later marked as being written to
   // (e.g. on the left-hand side of an assignment).
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index b34a7f18b5809..a8d6e2551aab5 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -76,13 +76,13 @@ class LifetimeSafetyAnalysis {
     return *LoanPropagation;
   }
   LiveOriginsAnalysis &getLiveOrigins() const { return *LiveOrigins; }
-  FactManager &getFactManager() { return FactMgr; }
+  FactManager &getFactManager() { return *FactMgr; }
 
 private:
   AnalysisDeclContext &AC;
   LifetimeSafetyReporter *Reporter;
   LifetimeFactory Factory;
-  FactManager FactMgr;
+  std::unique_ptr<FactManager> FactMgr;
   std::unique_ptr<LiveOriginsAnalysis> LiveOrigins;
   std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
 };
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index 56b9010f41fa2..c595a9fdf6233 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,6 +16,7 @@
 
 #include "clang/AST/Decl.h"
 #include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
 
 namespace clang::lifetimes::internal {
@@ -28,12 +29,10 @@ inline llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, OriginID ID) {
 
 /// An Origin is a symbolic identifier that represents the set of possible
 /// loans a pointer-like object could hold at any given time.
-/// TODO: Enhance the origin model to handle complex types, pointer
-/// indirection and reborrowing. The plan is to move from a single origin per
-/// variable/expression to a "list of origins" governed by the Type.
-/// For example, the type 'int**' would have two origins.
-/// See discussion:
-/// https://github.com/llvm/llvm-project/pull/142313/commits/0cd187b01e61b200d92ca0b640789c1586075142#r2137644238
+///
+/// Each Origin corresponds to a single level of indirection. For complex types
+/// with multiple levels of indirection (e.g., `int**`), multiple Origins are
+/// organized into an OriginTree structure (see below).
 struct Origin {
   OriginID ID;
   /// A pointer to the AST node that this origin represents. This union
@@ -41,8 +40,19 @@ struct Origin {
   /// and origins from expressions.
   llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *> Ptr;
 
-  Origin(OriginID ID, const clang::ValueDecl *D) : ID(ID), Ptr(D) {}
-  Origin(OriginID ID, const clang::Expr *E) : ID(ID), Ptr(E) {}
+  /// The type at this indirection level.
+  ///
+  /// For `int** pp`:
+  ///   Root origin: QT = `int**` (what pp points to)
+  ///   Pointee origin: QT = `int*` (what *pp points to)
+  ///
+  /// Null for synthetic lvalue origins (e.g., outer origin of DeclRefExpr).
+  QualType QT;
+
+  Origin(OriginID ID, const clang::ValueDecl *D, QualType QT)
+      : ID(ID), Ptr(D), QT(QT) {}
+  Origin(OriginID ID, const clang::Expr *E, QualType QT)
+      : ID(ID), Ptr(E), QT(QT) {}
 
   const clang::ValueDecl *getDecl() const {
     return Ptr.dyn_cast<const clang::ValueDecl *>();
@@ -52,28 +62,81 @@ struct Origin {
   }
 };
 
+/// A tree of origins representing levels of indirection for pointer-like types.
+///
+/// Each node in the tree contains an OriginID representing a level of
+/// indirection. The tree structure captures the multi-level nature of
+/// pointer and reference types in the lifetime analysis.
+///
+/// Examples:
+///   - For `int& x`, the tree has depth 2:
+///     * Root: origin for the reference storage itself (the lvalue `x`)
+///     * Pointee: origin for what `x` refers to
+///
+///   - For `int* p`, the tree has depth 2:
+///     * Root: origin for the pointer variable `p`
+///     * Pointee: origin for what `p` points to
+///
+///   - For `View v` (where View is gsl::Pointer), the tree has depth 2:
+///     * Root: origin for the view object itself
+///     * Pointee: origin for what the view refers to
+///
+///   - For `int** pp`, the tree has depth 3:
+///     * Root: origin for `pp` itself
+///     * Pointee: origin for `*pp` (what `pp` points to)
+///     * Pointee->Pointee: origin for `**pp` (what `*pp` points to)
+///
+/// The tree structure enables the analysis to track how loans flow through
+/// different levels of indirection when assignments and dereferences occur.
+struct OriginTree {
+  OriginID OID;
+  OriginTree *Pointee = nullptr;
+
+  OriginTree(OriginID OID) : OID(OID) {}
+
+  size_t getDepth() const {
+    size_t Depth = 1;
+    const OriginTree *T = this;
+    while (T->Pointee) {
+      T = T->Pointee;
+      Depth++;
+    }
+    return Depth;
+  }
+};
+
+bool hasOrigins(QualType QT);
+bool hasOrigins(const Expr *E);
+bool doesDeclHaveStorage(const ValueDecl *D);
+
 /// Manages the creation, storage, and retrieval of origins for pointer-like
 /// variables and expressions.
 class OriginManager {
 public:
-  OriginManager() = default;
-
-  Origin &addOrigin(OriginID ID, const clang::ValueDecl &D);
-  Origin &addOrigin(OriginID ID, const clang::Expr &E);
-
-  // TODO: Mark this method as const once we remove the call to getOrCreate.
-  OriginID get(const Expr &E);
-
-  OriginID get(const ValueDecl &D);
-
-  OriginID getOrCreate(const Expr &E);
+  explicit OriginManager(ASTContext &AST) : AST(AST) {}
+
+  /// Gets or creates the OriginTree for a given ValueDecl.
+  ///
+  /// Creates a tree structure mirroring the levels of indirection in the
+  /// declaration's type (e.g., `int** p` creates depth 2).
+  ///
+  /// \returns The OriginTree, or nullptr if the type is not pointer-like.
+  OriginTree *getOrCreateTree(const ValueDecl *D);
+
+  /// Gets or creates the OriginTree for a given Expr.
+  ///
+  /// Creates a tree based on the expression's type and value category:
+  /// - Lvalues get an implicit reference level (modeling addressability)
+  /// - Rvalues of non-pointer type return nullptr (no trackable origin)
+  /// - DeclRefExpr may reuse the underlying declaration's tree
+  ///
+  /// \returns The OriginTree, or nullptr for non-pointer rvalues.
+  OriginTree *getOrCreateTree(const Expr *E, size_t Depth = 0);
 
   const Origin &getOrigin(OriginID ID) const;
 
   llvm::ArrayRef<Origin> getOrigins() const { return AllOrigins; }
 
-  OriginID getOrCreate(const ValueDecl &D);
-
   unsigned getNumOrigins() const { return NextOriginID.Value; }
 
   void dump(OriginID OID, llvm::raw_ostream &OS) const;
@@ -81,12 +144,29 @@ class OriginManager {
 private:
   OriginID getNextOriginID() { return NextOriginID++; }
 
+  OriginTree *createNode(const ValueDecl *D, QualType QT) {
+    OriginID NewID = getNextOriginID();
+    AllOrigins.emplace_back(NewID, D, QT);
+    return new (TreeAllocator.Allocate<OriginTree>()) OriginTree(NewID);
+  }
+
+  OriginTree *createNode(const Expr *E, QualType QT) {
+    OriginID NewID = getNextOriginID();
+    AllOrigins.emplace_back(NewID, E, QT);
+    return new (TreeAllocator.Allocate<OriginTree>()) OriginTree(NewID);
+  }
+
+  template <typename T>
+  OriginTree *buildTreeForType(QualType QT, const T *Node);
+
+  ASTContext &AST;
   OriginID NextOriginID{0};
-  /// TODO(opt): Profile and evaluate the usefullness of small buffer
+  /// TODO(opt): Profile and evaluate the usefulness of small buffer
   /// optimisation.
   llvm::SmallVector<Origin> AllOrigins;
-  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
-  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+  llvm::BumpPtrAllocator TreeAllocator;
+  llvm::DenseMap<const clang::ValueDecl *, OriginTree *> DeclToTree;
+  llvm::DenseMap<const clang::Expr *, OriginTree *> ExprToTree;
 };
 } // namespace clang::lifetimes::internal
 
diff --git a/clang/lib/Analysis/LifetimeSafety/Facts.cpp b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
index 0ae7111c489e8..d83f88409dcfa 100644
--- a/clang/lib/Analysis/LifetimeSafety/Facts.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
@@ -35,12 +35,14 @@ void ExpireFact::dump(llvm::raw_ostream &OS, const LoanManager &LM,
 
 void OriginFlowFact::dump(llvm::raw_ostream &OS, const LoanManager &,
                           const OriginManager &OM) const {
-  OS << "OriginFlow (Dest: ";
+  OS << "OriginFlow: \n";
+  OS << "\tDest: ";
   OM.dump(getDestOriginID(), OS);
-  OS << ", Src: ";
+  OS << "\n";
+  OS << "\tSrc:  ";
   OM.dump(getSrcOriginID(), OS);
   OS << (getKillDest() ? "" : ", Merge");
-  OS << ")\n";
+  OS << "\n";
 }
 
 void OriginEscapesFact::dump(llvm::raw_ostream &OS, const LoanManager &,
@@ -53,7 +55,12 @@ void OriginEscapesFact::dump(llvm::raw_ostream &OS, const LoanManager &,
 void UseFact::dump(llvm::raw_ostream &OS, const LoanManager &,
                    const OriginManager &OM) const {
   OS << "Use (";
-  OM.dump(getUsedOrigin(), OS);
+  size_t NumUsedOrigins = getUsedOrigins().size();
+  for (size_t I = 0; I < NumUsedOrigins; ++I) {
+    OM.dump(getUsedOrigins()[I], OS);
+    if (I < NumUsedOrigins - 1)
+      OS << ", ";
+  }
   OS << ", " << (isWritten() ? "Write" : "Read") << ")\n";
 }
 
diff --git a/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp b/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
index 00870c3fd4086..3ac5005dc7897 100644
--- a/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
@@ -6,25 +6,54 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include <cassert>
+#include <string>
+
+#include "clang/AST/OperationKinds.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeAnnotations.h"
 #include "clang/Analysis/Analyses/PostOrderCFGView.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/Signals.h"
 #include "llvm/Support/TimeProfiler.h"
 
 namespace clang::lifetimes::internal {
 using llvm::isa_and_present;
 
-static bool isPointerType(QualType QT) {
-  return QT->isPointerOrReferenceType() || isGslPointerType(QT);
+OriginTree *FactsGenerator::getTree(const ValueDecl &D) {
+  return FactMgr.getOriginMgr().getOrCreateTree(&D);
 }
-// Check if a type has an origin.
-static bool hasOrigin(const Expr *E) {
-  return E->isGLValue() || isPointerType(E->getType());
+OriginTree *FactsGenerator::getTree(const Expr &E) {
+  return FactMgr.getOriginMgr().getOrCreateTree(&E);
 }
 
-static bool hasOrigin(const VarDecl *VD) {
-  return isPointerType(VD->getType());
+/// Propagates origin information from Src to Dst through all levels of
+/// indirection, creating OriginFlowFacts at each level.
+///
+/// This function enforces a critical type-safety invariant: both trees must
+/// have the same shape (same depth/structure). This invariant ensures that
+/// origins flow only between compatible types during expression evaluation.
+///
+/// Examples:
+///   - `int* p = &x;` flows origins from `&x` (depth 1) to `p` (depth 1)
+///   - `int** pp = &p;` flows origins from `&p` (depth 2) to `pp` (depth 2)
+///     * Level 1: pp <- p's address
+///     * Level 2: (*pp) <- what p points to (i.e., &x)
+///   - `View v = obj;` flows origins from `obj` (depth 1) to `v` (depth 1)
+void FactsGenerator::flow(OriginTree *Dst, OriginTree *Src, bool Kill) {
+  if (!Dst)
+    return;
+  assert(Src &&
+         "Dst is non-null but Src is null. Trees must have the same shape");
+  assert(Dst->getDepth() == Src->getDepth() &&
+         "Trees must have the same shape");
+
+  while (Dst && Src) {
+    CurrentBlockFacts.push_back(
+        FactMgr.createFact<OriginFlowFact>(Dst->OID, Src->OID, Kill));
+    Dst = Dst->Pointee;
+    Src = Src->Pointee;
+  }
 }
 
 /// Creates a loan for the storage path of a given declaration reference.
@@ -64,29 +93,43 @@ void FactsGenerator::run() {
 void FactsGenerator::VisitDeclStmt(const DeclStmt *DS) {
   for (const Decl *D : DS->decls())
     if (const auto *VD = dyn_cast<VarDecl>(D))
-      if (hasOrigin(VD))
-        if (const Expr *InitExpr = VD->getInit())
-          killAndFlowOrigin(*VD, *InitExpr);
+      if (const Expr *InitExpr = VD->getInit()) {
+        OriginTree *VDTree = getTree(*VD);
+        if (!VDTree)
+          continue;
+        OriginTree *InitTree = getTree(*InitExpr);
+        assert(InitTree && "VarDecl had origins but InitExpr did not");
+        // Special handling for rvalue references initialized with xvalues.
+        // For declarations like `Ranges&& r = std::move(ranges);`, the rvalue
+        // reference should directly refer to the object being moved from,
+        // rather than creating a new indirection level. We skip the outer
+        // reference level and flow the pointee origins directly.
+        if (VD->getType()->isRValueReferenceType() && InitExpr->isXValue()) {
+          flow(VDTree->Pointee, InitTree->Pointee, /*Kill=*/true);
+          continue;
+        }
+        flow(VDTree, InitTree, /*Kill=*/true);
+      }
 }
 
 void FactsGenerator::VisitDeclRefExpr(const DeclRefExpr *DRE) {
+  // Skip function references and PR values.
+  if (DRE->getFoundDecl()->isFunctionOrFunctionTemplate() || !DRE->isGLValue())
+    return;
   handleUse(DRE);
-  // For non-pointer/non-view types, a reference to the variable's storage
-  // is a borrow. We create a loan for it.
-  // For pointer/view types, we stick to the existing model for now and do
-  // not create an extra origin for the l-value expression itself.
-
-  // TODO: A single origin for a `DeclRefExpr` for a pointer or view type is
-  // not sufficient to model the different levels of indirection. The current
-  // single-origin model cannot distinguish between a loan to the variable's
-  // storage and a loan to what it points to. A multi-origin model would be
-  // required for this.
-  if (!isPointerType(DRE->getType())) {
-    if (const Loan *L = createLoan(FactMgr, DRE)) {
-      OriginID ExprOID = FactMgr.getOriginMgr().getOrCreate(*DRE);
-      CurrentBlockFacts.push_back(
-          FactMgr.createFact<IssueFact>(L->ID, ExprOID));
-    }
+  // For pointer/view types, handleUse tracks all levels of indirection through
+  // the OriginTree structure.
+  //
+  // For non-pointer/non-reference types (e.g., `int x`), taking the address
+  // creates a borrow of the variable's storage. We issue a loan for this case.
+  if (doesDeclHaveStorage(DRE->getDecl())) {
+    const Loan *L = createLoan(FactMgr, DRE);
+    assert(L);
+    OriginTree *Tree = getTree(*DRE);
+    assert(Tree &&
+           "gl-value DRE of non-pointer type should have an origin tree");
+    CurrentBlockFacts.push_back(
+        FactMgr.createFact<IssueFact>(L->ID, Tree->OID));
   }
 }
 
@@ -100,12 +143,14 @@ void FactsGenerator::VisitCXXConstructExpr(const CXXConstructExpr *CCE) {
 void FactsGenerator::VisitCXXMemberCallExpr(const CXXMemberCallExpr *MCE) {
   // Specifically for conversion operators,
   // like `std::string_view p = std::string{};`
-  if (isGslPointerType(MCE->getType()) &&
-      isa_and_present<CXXConversionDecl>(MCE->getCalleeDecl())) {
+  if (isa_and_present<CXXConversionDecl>(MCE->getCalleeDecl()) &&
+      isGslPointerType(MCE->getType()) &&
+      isGslOwnerType(MCE->getImplicitObjectArgument()->getType())) {
     // The argument is the implicit object itself.
     handleFunctionCall(MCE, MCE->getMethodDecl(),
                        {MCE->getImplicitObjectArgument()},
                        /*IsGslConstruction=*/true);
+    return;
   }
   if (const CXXMethodDecl *Method = MCE->getMethodDecl()) {
     // Construct the argument list, with the implicit 'this' object as the
@@ -127,24 +172,46 @@ void FactsGenerator::VisitCXXNullPtrLiteralExpr(
     const CXXNullPtrLiteralExpr *N) {
   /// TODO: Handle nullptr expr as a special 'null' loan. Uninitialized
   /// pointers can use the same type of loan.
-  FactMgr.getOriginMgr().getOrCreate(*N);
+  getTree(*N);
 }
 
 void FactsGenerator::VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
-  if (!hasOrigin(ICE))
+  OriginTree *Dest = getTree(*ICE);
+  if (!Dest)
+    return;
+  OriginTree *SrcTree = getTree(*ICE->getSubExpr());
+
+  if (ICE->getCastKind() == CK_LValueToRValue) {
+    // TODO: Decide what to do for x-values here.
+    if (!ICE->getSubExpr()->isLValue())
+      return;
+
+    assert(SrcTree && "LValue being cast to RValue has no origin tree");
+    // The result of an LValue-to-RValue cast on a reference-to-pointer like
+    // has the inner origin....
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2025

@llvm/pr-subscribers-clang

Author: Utkarsh Saxena (usx95)

Changes

This PR implements a multi-level origin model for the lifetime safety analysis, replacing the previous single-origin-per-variable approach with an OriginTree structure that captures multiple levels of indirection.

KeyChanges

  1. OriginTree Structure: Each origin now represents a single level of indirection, organized into trees:
    • int* p → depth 2 (pointer variable + pointee)
    • int** pp → depth 3 (variable + first pointee + second pointee)
    • std::string_view&amp; → depth 2 (reference + view object)
  2. Reference Type Understanding: The analysis now properly distinguishes:
    • Pointer types vs references-to-pointer types (string_view vs string_view&amp;)
    • References as aliases: int a; int&amp; b = a; int&amp; c = b; correctly recognizes both b and c alias a
    • References don't add storage and they reuse the underlying declaration's origins.
  3. Type-Driven Origin Creation: Origins are created based on type structure via buildTreeForType, ensuring origins match the type's indirection levels.
  4. Multi-Level Flow Propagation: The flow function propagates origins through all tree levels with a critical assertion:
    This ensures type safety in origin propagation during expression handling. (Rant with relief: This assert​ was quite hard to get right but it helped make the right changes).
  5. Lifetimebound Semantics: For reference return types, lifetimebound now propagates only the outermost origin, not inner pointee origins.

We are also deleting many tests in lifetime-safety-dataflow.cpp related to control flow which are better tested in unit tests and the other lit test.


Patch is 98.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168344.diff

14 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h (+7-6)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h (+7-10)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h (+2-2)
  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h (+104-24)
  • (modified) clang/lib/Analysis/LifetimeSafety/Facts.cpp (+11-4)
  • (modified) clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp (+231-90)
  • (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+27-7)
  • (modified) clang/lib/Analysis/LifetimeSafety/LiveOrigins.cpp (+13-7)
  • (modified) clang/lib/Analysis/LifetimeSafety/LoanPropagation.cpp (+2-1)
  • (modified) clang/lib/Analysis/LifetimeSafety/Origins.cpp (+113-58)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+3)
  • (modified) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+154-372)
  • (modified) clang/test/Sema/warn-lifetime-safety.cpp (+384-30)
  • (modified) clang/unittests/Analysis/LifetimeSafetyTest.cpp (+55-30)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
index b5f7f8746186a..908d2a5b8cc76 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
@@ -155,7 +155,8 @@ class OriginEscapesFact : public Fact {
 
 class UseFact : public Fact {
   const Expr *UseExpr;
-  OriginID OID;
+  // The origins of the expression being used.
+  llvm::SmallVector<OriginID, 1> OIDs;
   // True if this use is a write operation (e.g., left-hand side of assignment).
   // Write operations are exempted from use-after-free checks.
   bool IsWritten = false;
@@ -163,10 +164,10 @@ class UseFact : public Fact {
 public:
   static bool classof(const Fact *F) { return F->getKind() == Kind::Use; }
 
-  UseFact(const Expr *UseExpr, OriginManager &OM)
-      : Fact(Kind::Use), UseExpr(UseExpr), OID(OM.get(*UseExpr)) {}
+  UseFact(const Expr *UseExpr, llvm::ArrayRef<OriginID> OIDs)
+      : Fact(Kind::Use), UseExpr(UseExpr), OIDs(OIDs.begin(), OIDs.end()) {}
 
-  OriginID getUsedOrigin() const { return OID; }
+  llvm::ArrayRef<OriginID> getUsedOrigins() const { return OIDs; }
   const Expr *getUseExpr() const { return UseExpr; }
   void markAsWritten() { IsWritten = true; }
   bool isWritten() const { return IsWritten; }
@@ -194,8 +195,8 @@ class TestPointFact : public Fact {
 
 class FactManager {
 public:
-  void init(const CFG &Cfg) {
-    assert(BlockToFacts.empty() && "FactManager already initialized");
+  FactManager(const AnalysisDeclContext &AC, const CFG &Cfg)
+      : OriginMgr(AC.getASTContext()) {
     BlockToFacts.resize(Cfg.getNumBlockIDs());
   }
 
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
index 878cb90b685f9..939f421505463 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
@@ -50,6 +50,11 @@ class FactsGenerator : public ConstStmtVisitor<FactsGenerator> {
   void VisitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *MTE);
 
 private:
+  OriginTree *getTree(const ValueDecl &D);
+  OriginTree *getTree(const Expr &E);
+
+  void flow(OriginTree *Dst, OriginTree *Src, bool Kill);
+
   void handleLifetimeEnds(const CFGLifetimeEnds &LifetimeEnds);
 
   void handleGSLPointerConstruction(const CXXConstructExpr *CCE);
@@ -64,26 +69,18 @@ class FactsGenerator : public ConstStmtVisitor<FactsGenerator> {
 
   template <typename Destination, typename Source>
   void flowOrigin(const Destination &D, const Source &S) {
-    OriginID DestOID = FactMgr.getOriginMgr().getOrCreate(D);
-    OriginID SrcOID = FactMgr.getOriginMgr().get(S);
-    CurrentBlockFacts.push_back(FactMgr.createFact<OriginFlowFact>(
-        DestOID, SrcOID, /*KillDest=*/false));
+    flow(getTree(D), getTree(S), /*Kill=*/false);
   }
 
   template <typename Destination, typename Source>
   void killAndFlowOrigin(const Destination &D, const Source &S) {
-    OriginID DestOID = FactMgr.getOriginMgr().getOrCreate(D);
-    OriginID SrcOID = FactMgr.getOriginMgr().get(S);
-    CurrentBlockFacts.push_back(
-        FactMgr.createFact<OriginFlowFact>(DestOID, SrcOID, /*KillDest=*/true));
+    flow(getTree(D), getTree(S), /*Kill=*/true);
   }
 
   /// Checks if the expression is a `void("__lifetime_test_point_...")` cast.
   /// If so, creates a `TestPointFact` and returns true.
   bool handleTestPoint(const CXXFunctionalCastExpr *FCE);
 
-  void handleAssignment(const Expr *LHSExpr, const Expr *RHSExpr);
-
   // A DeclRefExpr will be treated as a use of the referenced decl. It will be
   // checked for use-after-free unless it is later marked as being written to
   // (e.g. on the left-hand side of an assignment).
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index b34a7f18b5809..a8d6e2551aab5 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -76,13 +76,13 @@ class LifetimeSafetyAnalysis {
     return *LoanPropagation;
   }
   LiveOriginsAnalysis &getLiveOrigins() const { return *LiveOrigins; }
-  FactManager &getFactManager() { return FactMgr; }
+  FactManager &getFactManager() { return *FactMgr; }
 
 private:
   AnalysisDeclContext &AC;
   LifetimeSafetyReporter *Reporter;
   LifetimeFactory Factory;
-  FactManager FactMgr;
+  std::unique_ptr<FactManager> FactMgr;
   std::unique_ptr<LiveOriginsAnalysis> LiveOrigins;
   std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
 };
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index 56b9010f41fa2..c595a9fdf6233 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,6 +16,7 @@
 
 #include "clang/AST/Decl.h"
 #include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
 
 namespace clang::lifetimes::internal {
@@ -28,12 +29,10 @@ inline llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, OriginID ID) {
 
 /// An Origin is a symbolic identifier that represents the set of possible
 /// loans a pointer-like object could hold at any given time.
-/// TODO: Enhance the origin model to handle complex types, pointer
-/// indirection and reborrowing. The plan is to move from a single origin per
-/// variable/expression to a "list of origins" governed by the Type.
-/// For example, the type 'int**' would have two origins.
-/// See discussion:
-/// https://github.com/llvm/llvm-project/pull/142313/commits/0cd187b01e61b200d92ca0b640789c1586075142#r2137644238
+///
+/// Each Origin corresponds to a single level of indirection. For complex types
+/// with multiple levels of indirection (e.g., `int**`), multiple Origins are
+/// organized into an OriginTree structure (see below).
 struct Origin {
   OriginID ID;
   /// A pointer to the AST node that this origin represents. This union
@@ -41,8 +40,19 @@ struct Origin {
   /// and origins from expressions.
   llvm::PointerUnion<const clang::ValueDecl *, const clang::Expr *> Ptr;
 
-  Origin(OriginID ID, const clang::ValueDecl *D) : ID(ID), Ptr(D) {}
-  Origin(OriginID ID, const clang::Expr *E) : ID(ID), Ptr(E) {}
+  /// The type at this indirection level.
+  ///
+  /// For `int** pp`:
+  ///   Root origin: QT = `int**` (what pp points to)
+  ///   Pointee origin: QT = `int*` (what *pp points to)
+  ///
+  /// Null for synthetic lvalue origins (e.g., outer origin of DeclRefExpr).
+  QualType QT;
+
+  Origin(OriginID ID, const clang::ValueDecl *D, QualType QT)
+      : ID(ID), Ptr(D), QT(QT) {}
+  Origin(OriginID ID, const clang::Expr *E, QualType QT)
+      : ID(ID), Ptr(E), QT(QT) {}
 
   const clang::ValueDecl *getDecl() const {
     return Ptr.dyn_cast<const clang::ValueDecl *>();
@@ -52,28 +62,81 @@ struct Origin {
   }
 };
 
+/// A tree of origins representing levels of indirection for pointer-like types.
+///
+/// Each node in the tree contains an OriginID representing a level of
+/// indirection. The tree structure captures the multi-level nature of
+/// pointer and reference types in the lifetime analysis.
+///
+/// Examples:
+///   - For `int& x`, the tree has depth 2:
+///     * Root: origin for the reference storage itself (the lvalue `x`)
+///     * Pointee: origin for what `x` refers to
+///
+///   - For `int* p`, the tree has depth 2:
+///     * Root: origin for the pointer variable `p`
+///     * Pointee: origin for what `p` points to
+///
+///   - For `View v` (where View is gsl::Pointer), the tree has depth 2:
+///     * Root: origin for the view object itself
+///     * Pointee: origin for what the view refers to
+///
+///   - For `int** pp`, the tree has depth 3:
+///     * Root: origin for `pp` itself
+///     * Pointee: origin for `*pp` (what `pp` points to)
+///     * Pointee->Pointee: origin for `**pp` (what `*pp` points to)
+///
+/// The tree structure enables the analysis to track how loans flow through
+/// different levels of indirection when assignments and dereferences occur.
+struct OriginTree {
+  OriginID OID;
+  OriginTree *Pointee = nullptr;
+
+  OriginTree(OriginID OID) : OID(OID) {}
+
+  size_t getDepth() const {
+    size_t Depth = 1;
+    const OriginTree *T = this;
+    while (T->Pointee) {
+      T = T->Pointee;
+      Depth++;
+    }
+    return Depth;
+  }
+};
+
+bool hasOrigins(QualType QT);
+bool hasOrigins(const Expr *E);
+bool doesDeclHaveStorage(const ValueDecl *D);
+
 /// Manages the creation, storage, and retrieval of origins for pointer-like
 /// variables and expressions.
 class OriginManager {
 public:
-  OriginManager() = default;
-
-  Origin &addOrigin(OriginID ID, const clang::ValueDecl &D);
-  Origin &addOrigin(OriginID ID, const clang::Expr &E);
-
-  // TODO: Mark this method as const once we remove the call to getOrCreate.
-  OriginID get(const Expr &E);
-
-  OriginID get(const ValueDecl &D);
-
-  OriginID getOrCreate(const Expr &E);
+  explicit OriginManager(ASTContext &AST) : AST(AST) {}
+
+  /// Gets or creates the OriginTree for a given ValueDecl.
+  ///
+  /// Creates a tree structure mirroring the levels of indirection in the
+  /// declaration's type (e.g., `int** p` creates depth 2).
+  ///
+  /// \returns The OriginTree, or nullptr if the type is not pointer-like.
+  OriginTree *getOrCreateTree(const ValueDecl *D);
+
+  /// Gets or creates the OriginTree for a given Expr.
+  ///
+  /// Creates a tree based on the expression's type and value category:
+  /// - Lvalues get an implicit reference level (modeling addressability)
+  /// - Rvalues of non-pointer type return nullptr (no trackable origin)
+  /// - DeclRefExpr may reuse the underlying declaration's tree
+  ///
+  /// \returns The OriginTree, or nullptr for non-pointer rvalues.
+  OriginTree *getOrCreateTree(const Expr *E, size_t Depth = 0);
 
   const Origin &getOrigin(OriginID ID) const;
 
   llvm::ArrayRef<Origin> getOrigins() const { return AllOrigins; }
 
-  OriginID getOrCreate(const ValueDecl &D);
-
   unsigned getNumOrigins() const { return NextOriginID.Value; }
 
   void dump(OriginID OID, llvm::raw_ostream &OS) const;
@@ -81,12 +144,29 @@ class OriginManager {
 private:
   OriginID getNextOriginID() { return NextOriginID++; }
 
+  OriginTree *createNode(const ValueDecl *D, QualType QT) {
+    OriginID NewID = getNextOriginID();
+    AllOrigins.emplace_back(NewID, D, QT);
+    return new (TreeAllocator.Allocate<OriginTree>()) OriginTree(NewID);
+  }
+
+  OriginTree *createNode(const Expr *E, QualType QT) {
+    OriginID NewID = getNextOriginID();
+    AllOrigins.emplace_back(NewID, E, QT);
+    return new (TreeAllocator.Allocate<OriginTree>()) OriginTree(NewID);
+  }
+
+  template <typename T>
+  OriginTree *buildTreeForType(QualType QT, const T *Node);
+
+  ASTContext &AST;
   OriginID NextOriginID{0};
-  /// TODO(opt): Profile and evaluate the usefullness of small buffer
+  /// TODO(opt): Profile and evaluate the usefulness of small buffer
   /// optimisation.
   llvm::SmallVector<Origin> AllOrigins;
-  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
-  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+  llvm::BumpPtrAllocator TreeAllocator;
+  llvm::DenseMap<const clang::ValueDecl *, OriginTree *> DeclToTree;
+  llvm::DenseMap<const clang::Expr *, OriginTree *> ExprToTree;
 };
 } // namespace clang::lifetimes::internal
 
diff --git a/clang/lib/Analysis/LifetimeSafety/Facts.cpp b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
index 0ae7111c489e8..d83f88409dcfa 100644
--- a/clang/lib/Analysis/LifetimeSafety/Facts.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
@@ -35,12 +35,14 @@ void ExpireFact::dump(llvm::raw_ostream &OS, const LoanManager &LM,
 
 void OriginFlowFact::dump(llvm::raw_ostream &OS, const LoanManager &,
                           const OriginManager &OM) const {
-  OS << "OriginFlow (Dest: ";
+  OS << "OriginFlow: \n";
+  OS << "\tDest: ";
   OM.dump(getDestOriginID(), OS);
-  OS << ", Src: ";
+  OS << "\n";
+  OS << "\tSrc:  ";
   OM.dump(getSrcOriginID(), OS);
   OS << (getKillDest() ? "" : ", Merge");
-  OS << ")\n";
+  OS << "\n";
 }
 
 void OriginEscapesFact::dump(llvm::raw_ostream &OS, const LoanManager &,
@@ -53,7 +55,12 @@ void OriginEscapesFact::dump(llvm::raw_ostream &OS, const LoanManager &,
 void UseFact::dump(llvm::raw_ostream &OS, const LoanManager &,
                    const OriginManager &OM) const {
   OS << "Use (";
-  OM.dump(getUsedOrigin(), OS);
+  size_t NumUsedOrigins = getUsedOrigins().size();
+  for (size_t I = 0; I < NumUsedOrigins; ++I) {
+    OM.dump(getUsedOrigins()[I], OS);
+    if (I < NumUsedOrigins - 1)
+      OS << ", ";
+  }
   OS << ", " << (isWritten() ? "Write" : "Read") << ")\n";
 }
 
diff --git a/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp b/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
index 00870c3fd4086..3ac5005dc7897 100644
--- a/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
@@ -6,25 +6,54 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include <cassert>
+#include <string>
+
+#include "clang/AST/OperationKinds.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/LifetimeAnnotations.h"
 #include "clang/Analysis/Analyses/PostOrderCFGView.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/Signals.h"
 #include "llvm/Support/TimeProfiler.h"
 
 namespace clang::lifetimes::internal {
 using llvm::isa_and_present;
 
-static bool isPointerType(QualType QT) {
-  return QT->isPointerOrReferenceType() || isGslPointerType(QT);
+OriginTree *FactsGenerator::getTree(const ValueDecl &D) {
+  return FactMgr.getOriginMgr().getOrCreateTree(&D);
 }
-// Check if a type has an origin.
-static bool hasOrigin(const Expr *E) {
-  return E->isGLValue() || isPointerType(E->getType());
+OriginTree *FactsGenerator::getTree(const Expr &E) {
+  return FactMgr.getOriginMgr().getOrCreateTree(&E);
 }
 
-static bool hasOrigin(const VarDecl *VD) {
-  return isPointerType(VD->getType());
+/// Propagates origin information from Src to Dst through all levels of
+/// indirection, creating OriginFlowFacts at each level.
+///
+/// This function enforces a critical type-safety invariant: both trees must
+/// have the same shape (same depth/structure). This invariant ensures that
+/// origins flow only between compatible types during expression evaluation.
+///
+/// Examples:
+///   - `int* p = &x;` flows origins from `&x` (depth 1) to `p` (depth 1)
+///   - `int** pp = &p;` flows origins from `&p` (depth 2) to `pp` (depth 2)
+///     * Level 1: pp <- p's address
+///     * Level 2: (*pp) <- what p points to (i.e., &x)
+///   - `View v = obj;` flows origins from `obj` (depth 1) to `v` (depth 1)
+void FactsGenerator::flow(OriginTree *Dst, OriginTree *Src, bool Kill) {
+  if (!Dst)
+    return;
+  assert(Src &&
+         "Dst is non-null but Src is null. Trees must have the same shape");
+  assert(Dst->getDepth() == Src->getDepth() &&
+         "Trees must have the same shape");
+
+  while (Dst && Src) {
+    CurrentBlockFacts.push_back(
+        FactMgr.createFact<OriginFlowFact>(Dst->OID, Src->OID, Kill));
+    Dst = Dst->Pointee;
+    Src = Src->Pointee;
+  }
 }
 
 /// Creates a loan for the storage path of a given declaration reference.
@@ -64,29 +93,43 @@ void FactsGenerator::run() {
 void FactsGenerator::VisitDeclStmt(const DeclStmt *DS) {
   for (const Decl *D : DS->decls())
     if (const auto *VD = dyn_cast<VarDecl>(D))
-      if (hasOrigin(VD))
-        if (const Expr *InitExpr = VD->getInit())
-          killAndFlowOrigin(*VD, *InitExpr);
+      if (const Expr *InitExpr = VD->getInit()) {
+        OriginTree *VDTree = getTree(*VD);
+        if (!VDTree)
+          continue;
+        OriginTree *InitTree = getTree(*InitExpr);
+        assert(InitTree && "VarDecl had origins but InitExpr did not");
+        // Special handling for rvalue references initialized with xvalues.
+        // For declarations like `Ranges&& r = std::move(ranges);`, the rvalue
+        // reference should directly refer to the object being moved from,
+        // rather than creating a new indirection level. We skip the outer
+        // reference level and flow the pointee origins directly.
+        if (VD->getType()->isRValueReferenceType() && InitExpr->isXValue()) {
+          flow(VDTree->Pointee, InitTree->Pointee, /*Kill=*/true);
+          continue;
+        }
+        flow(VDTree, InitTree, /*Kill=*/true);
+      }
 }
 
 void FactsGenerator::VisitDeclRefExpr(const DeclRefExpr *DRE) {
+  // Skip function references and PR values.
+  if (DRE->getFoundDecl()->isFunctionOrFunctionTemplate() || !DRE->isGLValue())
+    return;
   handleUse(DRE);
-  // For non-pointer/non-view types, a reference to the variable's storage
-  // is a borrow. We create a loan for it.
-  // For pointer/view types, we stick to the existing model for now and do
-  // not create an extra origin for the l-value expression itself.
-
-  // TODO: A single origin for a `DeclRefExpr` for a pointer or view type is
-  // not sufficient to model the different levels of indirection. The current
-  // single-origin model cannot distinguish between a loan to the variable's
-  // storage and a loan to what it points to. A multi-origin model would be
-  // required for this.
-  if (!isPointerType(DRE->getType())) {
-    if (const Loan *L = createLoan(FactMgr, DRE)) {
-      OriginID ExprOID = FactMgr.getOriginMgr().getOrCreate(*DRE);
-      CurrentBlockFacts.push_back(
-          FactMgr.createFact<IssueFact>(L->ID, ExprOID));
-    }
+  // For pointer/view types, handleUse tracks all levels of indirection through
+  // the OriginTree structure.
+  //
+  // For non-pointer/non-reference types (e.g., `int x`), taking the address
+  // creates a borrow of the variable's storage. We issue a loan for this case.
+  if (doesDeclHaveStorage(DRE->getDecl())) {
+    const Loan *L = createLoan(FactMgr, DRE);
+    assert(L);
+    OriginTree *Tree = getTree(*DRE);
+    assert(Tree &&
+           "gl-value DRE of non-pointer type should have an origin tree");
+    CurrentBlockFacts.push_back(
+        FactMgr.createFact<IssueFact>(L->ID, Tree->OID));
   }
 }
 
@@ -100,12 +143,14 @@ void FactsGenerator::VisitCXXConstructExpr(const CXXConstructExpr *CCE) {
 void FactsGenerator::VisitCXXMemberCallExpr(const CXXMemberCallExpr *MCE) {
   // Specifically for conversion operators,
   // like `std::string_view p = std::string{};`
-  if (isGslPointerType(MCE->getType()) &&
-      isa_and_present<CXXConversionDecl>(MCE->getCalleeDecl())) {
+  if (isa_and_present<CXXConversionDecl>(MCE->getCalleeDecl()) &&
+      isGslPointerType(MCE->getType()) &&
+      isGslOwnerType(MCE->getImplicitObjectArgument()->getType())) {
     // The argument is the implicit object itself.
     handleFunctionCall(MCE, MCE->getMethodDecl(),
                        {MCE->getImplicitObjectArgument()},
                        /*IsGslConstruction=*/true);
+    return;
   }
   if (const CXXMethodDecl *Method = MCE->getMethodDecl()) {
     // Construct the argument list, with the implicit 'this' object as the
@@ -127,24 +172,46 @@ void FactsGenerator::VisitCXXNullPtrLiteralExpr(
     const CXXNullPtrLiteralExpr *N) {
   /// TODO: Handle nullptr expr as a special 'null' loan. Uninitialized
   /// pointers can use the same type of loan.
-  FactMgr.getOriginMgr().getOrCreate(*N);
+  getTree(*N);
 }
 
 void FactsGenerator::VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
-  if (!hasOrigin(ICE))
+  OriginTree *Dest = getTree(*ICE);
+  if (!Dest)
+    return;
+  OriginTree *SrcTree = getTree(*ICE->getSubExpr());
+
+  if (ICE->getCastKind() == CK_LValueToRValue) {
+    // TODO: Decide what to do for x-values here.
+    if (!ICE->getSubExpr()->isLValue())
+      return;
+
+    assert(SrcTree && "LValue being cast to RValue has no origin tree");
+    // The result of an LValue-to-RValue cast on a reference-to-pointer like
+    // has the inner origin....
[truncated]

@usx95 usx95 self-assigned this Nov 27, 2025
@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch from f622384 to 3e3ccdc Compare November 28, 2025 05:27
@usx95 usx95 requested review from Xazax-hun and ymand November 28, 2025 05:28
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a very rough first pass, will need to take a second look. I wonder if it is a good idea to call it an OriginTree when it is actually a list. I understand that it might become a tree later once/if we have named lifetimes, but I wonder if it is a bit confusing to call it a tree in the interim period.


void FactsGenerator::VisitDeclRefExpr(const DeclRefExpr *DRE) {
// Skip function references and PR values.
if (DRE->getFoundDecl()->isFunctionOrFunctionTemplate() || !DRE->isGLValue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation of skipping functions? Is there some handling we are missing or are these always safe?

Can you give me an example where a DRE is a PRValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation of skipping functions?

Primary reason is that function references/glvalues are uninteresting to the analysis and create extraneous facts.

Can you give me an example where a DRE is a PRValue?

EnumConstants. Switched to explicitly skipping those to be clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is it only EnumConstants, or would other kinds of constants (like constexpr int i = 5) be PRValues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nopes. https://godbolt.org/z/ccxjcdnz4 Still lvalue. ¯\_(ツ)_/¯

@Xazax-hun
Copy link
Collaborator

Could you also update the PR description with the Tree -> List change?

///
/// Examples:
/// - For `int& x`, the list has size 2:
/// * Root: origin for the reference storage itself (the lvalue `x`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if Head is more idiomatic for lists than Root. I do not have strong feelings though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Outer and Inner to match the updated field names.

const Expr *LHSExpr = OCE->getArg(0);
const Expr *RHSExpr = OCE->getArg(1);

if (const auto *DRE_LHS =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we deduplicate this and the built-in assignment handling?

// Example:
// View(View v);
// View(const View &v);
if (Arg->isGLValue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of peeling the outer origin for GLValues come up pretty often. I wonder if we should have a small helper for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This indeed looks like a pattern of forcing an LValueToRValue conversion.

@ymand
Copy link
Collaborator

ymand commented Dec 5, 2025

  1. Multi-Level Flow Propagation: The flow function propagates origins through all depths of the lists with a critical assertion:
    assert(Dst->getDepth() == Src->getDepth() && "Lists must have the same length"); This ensures type safety in origin propagation during expression handling. (Rant with relief: This assert​ was quite hard to get right but it helped make the right changes).

rant acknowledged. :) Sounds like struggling to get an induction hypothesis correct :)

Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Looks good so far!

/// Manages the creation, storage, and retrieval of origins for pointer-like
/// variables and expressions.
class OriginManager {
/// A list of origins representing levels of indirection for pointer-like types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you track through templates, like std::optional<int*>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately but I had some plans for such special cases in the future. The pre-existing analysis calls this Container of pointers using a heuristic where an Owner class is instantiated with a pointer type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I would recommend you consider whether this new design could effectively support template cases. That is, you don't have support them with this change, but you should be relatively sure it will work for them in the future. My intuition is that a vector is still the right structure, because that's exactly what we use for Nullability. The vector represents the pre-order traversal of the type tree, with each element representing one pointer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah I see what you were referring to.

Yes, my idea is to move from an OriginList to OriginTree to support multiple inner pointers like std::pair<int*, int*> . This could be represented as a pre-order array but my initial preference is to have a Tree representation for better understanding.

@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch from 96bad25 to db307d4 Compare December 8, 2025 14:54
@usx95 usx95 requested review from Xazax-hun and ymand December 8, 2025 14:56
Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all but FactsGenerator.cpp

(void)**pp;
}

int** test_ternary_double_ptr(bool cond) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on how the current model handles updates. What do you expect to happen in the following cases?

int* test_...(bool cond) {
  int a = 1, b = 2, c = 3;
  int* pa = &a;
  int* pb = &b;
  int** result = cond ? &pa : &pb;
  *result = &c; // We've just added a loan to `c` to one of `pa` or `pb`.
  return pa;  // Is that reflected here? I assume not becuase assignment does not track deref?
}

int* test_...(bool cond) {
  int a = 1, b = 2, c = 3;
  int* pa = &a;
  int* pb = &b;
  int** result = cond ? &pa : &pb;
  pa = &c; // We've updated pa ...
  return *result;  // ... is that reflected here?
}

int* test_...(bool cond) {
  int a = 1, b = 2, c = 3;
  int* pa = &a;
  int* pb = &b;
  int** result = cond ? &pa : &pb;
  pa = &c; // We've updated pa ...
  return pb;  // Does the analysis confuse `pb` with `pa` and report that a reference to `c` might escape?
}

Copy link
Contributor Author

@usx95 usx95 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the above works at this point.

The plan is to handle this using flow-sensitive alias tracking (similar to Polonius in Rust). This should be handled once we start creating aliases among origins due to cycles in subtype-relations.

Alias Creation via Multilevel Origins

int* pa = &a;
// Opa <- loan to 'a'

int** result = &pa;
// [Or1, Or2] <- [Opa_outer, Opa]
// Or1 <- loan to 'pa' (the storage of pa)
// Or2 <- Opa (what pa points to) 

The missing piece is creating a bidirectional alias relationship: Or2 and Opa should be treated as aliases/subtype of each other, forming an SCC in the origin-flow graph. This should for all inner origins at all levels. For example, here we should have // Opa <- Or2flow fact but not for Or1 and Opa_outer.

// Example 1: Update through dereference
*result = &c;  // Updates Or2, which should also update its alias Opa
return pa;     // Would correctly report loan to 'c' escapes
// Example 2: Update to aliased variable
pa = &c;        // Updates Opa, which should also update its alias Or2
return *result; // Would correctly report loan to 'c' escapes

This aliasing relationship is flow-sensitive - it's established when result = &pa executes and broken when result is reassigned to point elsewhere.

(Thinking about the case 3 to not make Opa and Opb alias of each other. I think it involves something on the lines of dropping flow facts which involve dead origins)


if (const auto *DRE_LHS =
dyn_cast<DeclRefExpr>(LHSExpr->IgnoreParenImpCasts())) {
OriginList *LHSList = getOriginsList(*DRE_LHS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you already filtered by the type to ensure that this is a relevant type? I was expecting something like isGslPointerType or mayber hasOrigins. Otherwise, how do we know the semantics of an arbitrary overloaded operator=?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I missed this. Added a precheck of hasOrigins.

const int* d = &c;
return d; // expected-note 2 {{returned here}}
}
// FIXME: Dereference of a pointer does not track the reference.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a fundamental limitation or just planned for a followup PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in a followup PR.

(void)*p; // expected-note 4 {{later used here}}
}

namespace GH162834 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this name? Is this a reference to known issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is #162834

@usx95 usx95 force-pushed the users/usx95/11-17-lifetime-safety-multi-origin branch from db307d4 to 5134452 Compare December 11, 2025 15:30
@usx95 usx95 requested a review from ymand December 11, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) clang Clang issues not falling into any other category

Projects

Status: No status

5 participants