Skip to content

Conversation

@marco-antognini-sonarsource
Copy link
Contributor

Reverts #115917 and its follow up #116840.
Fixes #153782 and introduces regression tests.
Reopens #114270.

…#116840)"

This reverts commit 9b2ec87.

Resolved conflicts in:
	clang/lib/StaticAnalyzer/Core/RegionStore.cpp
	clang/test/Analysis/ctor-trivial-copy.cpp
	clang/test/Analysis/explain-svals.cpp
…s (2/4) (llvm#115917)"

This reverts commit 4610e5c.

Resolved conflicts in:
	clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Marco Borgeaud (marco-antognini-sonarsource)

Changes

Reverts #115917 and its follow up #116840.
Fixes #153782 and introduces regression tests.
Reopens #114270.


Full diff: https://github.com/llvm/llvm-project/pull/163461.diff

10 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/RegionStore.cpp (-48)
  • (modified) clang/test/Analysis/NewDelete-checker-test.cpp (+76-4)
  • (modified) clang/test/Analysis/ctor-trivial-copy.cpp (+47-15)
  • (modified) clang/test/Analysis/explain-svals.cpp (+1-1)
  • (modified) clang/test/Analysis/iterator-modeling.cpp (-2)
  • (modified) clang/test/Analysis/stl-algorithm-modeling-aggressive-std-find-modeling.cpp (-10)
  • (modified) clang/test/Analysis/stl-algorithm-modeling.cpp (-10)
  • (modified) clang/test/Analysis/store-dump-orders.cpp (+1-1)
  • (modified) clang/test/Analysis/taint-generic.cpp (+5-1)
  • (modified) clang/test/Analysis/template-param-objects.cpp (+1-1)
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index af0ef52334bd7..5221a3248e46e 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -714,11 +714,6 @@ class RegionStoreManager : public StoreManager {
     return getBinding(getRegionBindings(S), L, T);
   }
 
-  std::optional<SVal> getUniqueDefaultBinding(RegionBindingsConstRef B,
-                                              const TypedValueRegion *R) const;
-  std::optional<SVal>
-  getUniqueDefaultBinding(nonloc::LazyCompoundVal LCV) const;
-
   std::optional<SVal> getDefaultBinding(Store S, const MemRegion *R) override {
     RegionBindingsRef B = getRegionBindings(S);
     // Default bindings are always applied over a base region so look up the
@@ -2465,11 +2460,6 @@ SVal RegionStoreManager::getBindingForStruct(RegionBindingsConstRef B,
   // behavior doesn't depend on the struct layout.
   // This way even an empty struct can carry taint, no matter if creduce drops
   // the last field member or not.
-
-  // Try to avoid creating a LCV if it would anyways just refer to a single
-  // default binding.
-  if (std::optional<SVal> Val = getUniqueDefaultBinding(B, R))
-    return *Val;
   return createLazyBinding(B, R);
 }
 
@@ -2757,50 +2747,12 @@ RegionStoreManager::bindVector(LimitedRegionBindingsConstRef B,
   return NewB;
 }
 
-std::optional<SVal>
-RegionStoreManager::getUniqueDefaultBinding(RegionBindingsConstRef B,
-                                            const TypedValueRegion *R) const {
-  if (R != R->getBaseRegion())
-    return std::nullopt;
-
-  const auto *Cluster = B.lookup(R);
-  if (!Cluster || !llvm::hasSingleElement(*Cluster))
-    return std::nullopt;
-
-  const auto [Key, Value] = *Cluster->begin();
-  return Key.isDirect() ? std::optional<SVal>{} : Value;
-}
-
-std::optional<SVal>
-RegionStoreManager::getUniqueDefaultBinding(nonloc::LazyCompoundVal LCV) const {
-  auto B = getRegionBindings(LCV.getStore());
-  return getUniqueDefaultBinding(B, LCV.getRegion());
-}
-
 std::optional<LimitedRegionBindingsRef> RegionStoreManager::tryBindSmallStruct(
     LimitedRegionBindingsConstRef B, const TypedValueRegion *R,
     const RecordDecl *RD, nonloc::LazyCompoundVal LCV) {
   if (B.hasExhaustedBindingLimit())
     return B.withValuesEscaped(LCV);
 
-  // If we try to copy a Conjured value representing the value of the whole
-  // struct, don't try to element-wise copy each field.
-  // That would unnecessarily bind Derived symbols slicing off the subregion for
-  // the field from the whole Conjured symbol.
-  //
-  //   struct Window { int width; int height; };
-  //   Window getWindow(); <-- opaque fn.
-  //   Window w = getWindow(); <-- conjures a new Window.
-  //   Window w2 = w; <-- trivial copy "w", calling "tryBindSmallStruct"
-  //
-  // We should not end up with a new Store for "w2" like this:
-  //   Direct [ 0..31]: Derived{Conj{}, w.width}
-  //   Direct [32..63]: Derived{Conj{}, w.height}
-  // Instead, we should just bind that Conjured value instead.
-  if (std::optional<SVal> Val = getUniqueDefaultBinding(LCV)) {
-    return B.addBinding(BindingKey::Make(R, BindingKey::Default), Val.value());
-  }
-
   FieldVector Fields;
 
   if (const CXXRecordDecl *Class = dyn_cast<CXXRecordDecl>(RD))
diff --git a/clang/test/Analysis/NewDelete-checker-test.cpp b/clang/test/Analysis/NewDelete-checker-test.cpp
index c417b9c2ac97e..2738ecc56ba7a 100644
--- a/clang/test/Analysis/NewDelete-checker-test.cpp
+++ b/clang/test/Analysis/NewDelete-checker-test.cpp
@@ -3,13 +3,13 @@
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDelete
 //
-// RUN: %clang_analyze_cc1 -DLEAKS -std=c++11 -fblocks %s \
+// RUN: %clang_analyze_cc1 -std=c++11 -fblocks %s \
 // RUN:   -verify=expected,newdelete,leak \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDelete \
 // RUN:   -analyzer-checker=cplusplus.NewDeleteLeaks
 //
-// RUN: %clang_analyze_cc1 -std=c++11 -fblocks -verify %s \
+// RUN: %clang_analyze_cc1 -std=c++11 -fblocks %s \
 // RUN:   -verify=expected,leak \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDeleteLeaks
@@ -19,13 +19,13 @@
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDelete
 //
-// RUN: %clang_analyze_cc1 -DLEAKS -std=c++17 -fblocks %s \
+// RUN: %clang_analyze_cc1 -std=c++17 -fblocks %s \
 // RUN:   -verify=expected,newdelete,leak \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDelete \
 // RUN:   -analyzer-checker=cplusplus.NewDeleteLeaks
 //
-// RUN: %clang_analyze_cc1 -std=c++17 -fblocks -verify %s \
+// RUN: %clang_analyze_cc1 -std=c++17 -fblocks %s \
 // RUN:   -verify=expected,leak,inspection \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=cplusplus.NewDeleteLeaks \
@@ -503,3 +503,75 @@ namespace optional_union {
     custom_union_t a;
   } // leak-warning{{Potential leak of memory pointed to by 'a.present.q'}}
 }
+
+namespace gh153782 {
+
+// Ensure we do not regress on the following use case.
+
+namespace mutually_exclusive_test_case_1 {
+struct StorageWrapper {
+  // Imagine those two call a reset() function (among other things)
+  ~StorageWrapper() { delete parts; }
+  StorageWrapper(StorageWrapper const&) = default;
+
+  // Mind that there is no assignment here -- this is the bug we would like to find.
+  void operator=(StorageWrapper&&) { delete parts; } // newdelete-warning{{Attempt to release already released memory}}
+
+  // Not provided, typically would do `parts = new long`.
+  StorageWrapper();
+
+  long* parts;
+};
+
+void test_non_trivial_struct_assignment() {
+  StorageWrapper* object = new StorageWrapper[]{StorageWrapper()};
+  object[0] = StorageWrapper(); // This assignment leads to the double-free.
+}
+} // mutually_exclusive_test_case_1
+
+namespace mutually_exclusive_test_case_2 {
+struct StorageWrapper {
+  // Imagine those two call a reset() function (among other things)
+  ~StorageWrapper() { delete parts; }
+  StorageWrapper(StorageWrapper const&) = default;
+
+  // Mind that there is no assignment here -- this is the bug we would like to find.
+  void operator=(StorageWrapper&&) { delete parts; }
+
+  // Not provided, typically would do `parts = new long`.
+  StorageWrapper();
+
+  long* parts;
+};
+
+void test_non_trivial_struct_assignment() {
+  StorageWrapper* object = new StorageWrapper[]{StorageWrapper()};
+  // object[0] = StorageWrapper(); // Remove the source of double free to make the potential leak appear.
+} // leak-warning{{Potential leak of memory pointed to by 'object'}}
+} // mutually_exclusive_test_case_2
+
+namespace mutually_exclusive_test_case_3 {
+struct StorageWrapper {
+  // Imagine those two call a reset() function (among other things)
+  ~StorageWrapper() { delete parts; }
+  StorageWrapper(StorageWrapper const&) = default;
+
+  // Mind that there is no assignment here -- this is the bug we would like to find.
+  void operator=(StorageWrapper&&) { delete parts; } // newdelete-warning{{Attempt to release already released memory}}
+
+  // Not provided, typically would do `parts = new long`.
+  StorageWrapper();
+
+  long* parts;
+};
+
+struct TestDoubleFreeWithInitializerList {
+  StorageWrapper* Object;
+  TestDoubleFreeWithInitializerList()
+  : Object(new StorageWrapper[]{StorageWrapper()}) {
+    Object[0] = StorageWrapper(); // This assignment leads to the double-free.
+  }
+};
+} // mutually_exclusive_test_case_3
+
+} // namespace gh153782
diff --git a/clang/test/Analysis/ctor-trivial-copy.cpp b/clang/test/Analysis/ctor-trivial-copy.cpp
index 940ff9ba3ed9c..44990fc631d6d 100644
--- a/clang/test/Analysis/ctor-trivial-copy.cpp
+++ b/clang/test/Analysis/ctor-trivial-copy.cpp
@@ -5,8 +5,6 @@
 void clang_analyzer_printState();
 template <typename T> void clang_analyzer_dump_lref(T& param);
 template <typename T> void clang_analyzer_dump_val(T param);
-template <typename T> void clang_analyzer_denote(T param, const char *name);
-template <typename T> void clang_analyzer_express(T param);
 template <typename T> T conjure();
 template <typename... Ts> void nop(const Ts &... args) {}
 
@@ -42,10 +40,16 @@ void test_assign_return() {
 namespace trivial_struct_copy {
 
 void _01_empty_structs() {
-  clang_analyzer_dump_val(conjure<empty>()); // expected-warning {{conj_$}}
+  clang_analyzer_dump_val(conjure<empty>()); // expected-warning {{lazyCompoundVal}}
   empty Empty = conjure<empty>();
   empty Empty2 = Empty;
   empty Empty3 = Empty2;
+  // All of these should refer to the exact same LCV, because all of
+  // these trivial copies refer to the original conjured value.
+  // There were Unknown before:
+  clang_analyzer_dump_val(Empty);  // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Empty2); // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Empty3); // expected-warning {{lazyCompoundVal}}
 
   // We only have binding for the original Empty object, because copying empty
   // objects is a no-op in the performTrivialCopy. This is fine, because empty
@@ -67,20 +71,18 @@ void _01_empty_structs() {
 }
 
 void _02_structs_with_members() {
-  clang_analyzer_dump_val(conjure<aggr>()); // expected-warning {{conj_$}}
+  clang_analyzer_dump_val(conjure<aggr>()); // expected-warning {{lazyCompoundVal}}
   aggr Aggr = conjure<aggr>();
   aggr Aggr2 = Aggr;
   aggr Aggr3 = Aggr2;
-  // All of these should refer to the exact same symbol, because all of
+  // All of these should refer to the exact same LCV, because all of
   // these trivial copies refer to the original conjured value.
-  clang_analyzer_denote(Aggr, "$Aggr");
-  clang_analyzer_express(Aggr);  // expected-warning {{$Aggr}}
-  clang_analyzer_express(Aggr2); // expected-warning {{$Aggr}}
-  clang_analyzer_express(Aggr3); // expected-warning {{$Aggr}}
-
-  // We should have the same Conjured symbol for "Aggr", "Aggr2" and "Aggr3".
-  // We used to have Derived symbols for the individual fields that were
-  // copied as part of copying the whole struct.
+  clang_analyzer_dump_val(Aggr);  // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Aggr2); // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Aggr3); // expected-warning {{lazyCompoundVal}}
+
+  // We have fields in the struct we copy, thus we also have the entries for the copies
+  // (and for all of their fields).
   clang_analyzer_printState();
   // CHECK:       "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
   // CHECK-NEXT:    { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
@@ -93,10 +95,12 @@ void _02_structs_with_members() {
   // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
   // CHECK-NEXT:    ]},
   // CHECK-NEXT:    { "cluster": "Aggr2", "pointer": "0x{{[0-9a-f]+}}", "items": [
-  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ]]" }
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 0, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.x}" },
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 32, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.y}" }
   // CHECK-NEXT:    ]},
   // CHECK-NEXT:    { "cluster": "Aggr3", "pointer": "0x{{[0-9a-f]+}}", "items": [
-  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ]]" }
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 0, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.x}" },
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 32, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.y}" }
   // CHECK-NEXT:    ]}
   // CHECK-NEXT:  ]},
 
@@ -113,3 +117,31 @@ void entrypoint() {
 }
 
 } // namespace trivial_struct_copy
+
+namespace gh153782 {
+
+// Ensure we do not regress on the following use cases.
+// The assumption made on a field in `setPtr` should apply to the returned copy in `func`.
+struct Status { int error; };
+Status getError();
+
+Status setPtr(int **outptr, int* ptr) {
+  Status e = getError();
+  if (e.error != 0) return e; // When assuming the error field is non-zero,
+  *outptr = ptr;              // this is not executed
+  return e;
+}
+
+int func() {
+  int *ptr = nullptr;
+  int x = 42;
+  if (setPtr(&ptr, &x).error == 0) {
+    // The assumption made in get() SHOULD match the assumption about
+    // the returned value, hence the engine SHOULD NOT assume ptr is null.
+    clang_analyzer_dump_val(ptr); // expected-warning {{&x}}
+    return *ptr;
+  }
+  return 0;
+}
+
+} // namespace gh153782
diff --git a/clang/test/Analysis/explain-svals.cpp b/clang/test/Analysis/explain-svals.cpp
index dfc650223c9e7..9474aa7c7dbb1 100644
--- a/clang/test/Analysis/explain-svals.cpp
+++ b/clang/test/Analysis/explain-svals.cpp
@@ -99,7 +99,7 @@ class C {
 } // end of anonymous namespace
 
 void test_6() {
-  clang_analyzer_explain(conjure_S()); // expected-warning-re{{{{^symbol of type 'int' conjured at CFG element 'conjure_S\(\) \(CXXRecordTypedCall, \+0\)'$}}}}
+  clang_analyzer_explain(conjure_S()); // expected-warning-re{{{{^lazily frozen compound value of 1st parameter of function 'clang_analyzer_explain\(\)'$}}}}
   clang_analyzer_explain(conjure_S().z); // expected-warning-re{{{{^value derived from \(symbol of type 'int' conjured at CFG element 'conjure_S\(\) \(CXXRecordTypedCall, \)'\) for field 'z' of temporary object constructed at statement 'conjure_S\(\)'$}}}}
 }
 
diff --git a/clang/test/Analysis/iterator-modeling.cpp b/clang/test/Analysis/iterator-modeling.cpp
index 78882da4431fd..f1538839d06c8 100644
--- a/clang/test/Analysis/iterator-modeling.cpp
+++ b/clang/test/Analysis/iterator-modeling.cpp
@@ -2035,7 +2035,6 @@ void print_state(std::vector<int> &V) {
   // CHECK:      "checker_messages": [
   // CHECK:   { "checker": "alpha.cplusplus.IteratorModeling", "messages": [
   // CHECK-NEXT:     "Iterator Positions :",
-  // CHECK-NEXT:     "conj_$[[#]]{int, LC[[#]], S[[#]], #[[#]]} : Valid ; Container == SymRegion{reg_$[[#]]<std::vector<int> & V>} ; Offset == conj_$[[#]]{long, LC[[#]], S[[#]], #[[#]]}",
   // CHECK-NEXT:     "i0 : Valid ; Container == SymRegion{reg_$[[#]]<std::vector<int> & V>} ; Offset == conj_$[[#]]{long, LC[[#]], S[[#]], #[[#]]}"
   // CHECK-NEXT:   ]}
 
@@ -2046,7 +2045,6 @@ void print_state(std::vector<int> &V) {
   // CHECK:      "checker_messages": [
   // CHECK:   { "checker": "alpha.cplusplus.IteratorModeling", "messages": [
   // CHECK-NEXT:     "Iterator Positions :",
-  // CHECK-NEXT:     "conj_$[[#]]{int, LC[[#]], S[[#]], #[[#]]} : Valid ; Container == SymRegion{reg_$[[#]]<std::vector<int> & V>} ; Offset == conj_$[[#]]{long, LC[[#]], S[[#]], #[[#]]}",
   // CHECK-NEXT:     "i1 : Valid ; Container == SymRegion{reg_$[[#]]<std::vector<int> & V>} ; Offset == conj_$[[#]]{long, LC[[#]], S[[#]], #[[#]]}"
   // CHECK-NEXT:   ]}
 
diff --git a/clang/test/Analysis/stl-algorithm-modeling-aggressive-std-find-modeling.cpp b/clang/test/Analysis/stl-algorithm-modeling-aggressive-std-find-modeling.cpp
index 191af95cd2b9c..98301cf7274fc 100644
--- a/clang/test/Analysis/stl-algorithm-modeling-aggressive-std-find-modeling.cpp
+++ b/clang/test/Analysis/stl-algorithm-modeling-aggressive-std-find-modeling.cpp
@@ -4,16 +4,6 @@
 // RUN:  -analyzer-config alpha.cplusplus.STLAlgorithmModeling:AggressiveStdFindModeling=true\
 // RUN:  -verify
 
-// STLAlgorithmModeling and DebugIteratorModeling are probably bugged because
-// these tests started failing after we just directly copy the symbol
-// representing the value of a variable instead of creating a LazyCompoundVal
-// of that single conjured value.
-// In theory, it shouldn't matter if we eagerly copy the value that we would
-// "load" from the LCV once requested or just directly binding the backing symbol.
-// Yet, these tests fail, so there is likely messed up how/what the checker
-// metadata is associated with.
-// XFAIL: *
-
 #include "Inputs/system-header-simulator-cxx.h"
 
 void clang_analyzer_eval(bool);
diff --git a/clang/test/Analysis/stl-algorithm-modeling.cpp b/clang/test/Analysis/stl-algorithm-modeling.cpp
index f7029c79b0942..5549c24a8c220 100644
--- a/clang/test/Analysis/stl-algorithm-modeling.cpp
+++ b/clang/test/Analysis/stl-algorithm-modeling.cpp
@@ -3,16 +3,6 @@
 // RUN:  -analyzer-config aggressive-binary-operation-simplification=true\
 // RUN:  -verify
 
-// STLAlgorithmModeling and DebugIteratorModeling are probably bugged because
-// these tests started failing after we just directly copy the symbol
-// representing the value of a variable instead of creating a LazyCompoundVal
-// of that single conjured value.
-// In theory, it shouldn't matter if we eagerly copy the value that we would
-// "load" from the LCV once requested or just directly binding the backing symbol.
-// Yet, these tests fail, so there is likely messed up how/what the checker
-// metadata is associated with.
-// XFAIL: *
-
 #include "Inputs/system-header-simulator-cxx.h"
 
 void clang_analyzer_eval(bool);
diff --git a/clang/test/Analysis/store-dump-orders.cpp b/clang/test/Analysis/store-dump-orders.cpp
index dbe93f1c5183a..d99f581f00fe1 100644
--- a/clang/test/Analysis/store-dump-orders.cpp
+++ b/clang/test/Analysis/store-dump-orders.cpp
@@ -41,7 +41,7 @@ void test_output(int n) {
   // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
   // CHECK-NEXT:    ]},
   // CHECK-NEXT:    { "cluster": "objfirst", "pointer": "0x{{[0-9a-f]+}}", "items": [
-  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "lazyCompoundVal
   // CHECK-NEXT:      { "kind": "Direct", "offset": 320, "value": "1 S32b" },
   // CHECK-NEXT:      { "kind": "Direct", "offset": 352, "value": "2 S32b" },
   // CHECK-NEXT:      { "kind": "Direct", "offset": 384, "value": "3 S32b" }
diff --git a/clang/test/Analysis/taint-generic.cpp b/clang/test/Analysis/taint-generic.cpp
index fc7c37300d3fc..4b8d9ab68ff84 100644
--- a/clang/test/Analysis/taint-generic.cpp
+++ b/clang/test/Analysis/taint-generic.cpp
@@ -158,7 +158,11 @@ void top() {
   clang_analyzer_isTainted(E); // expected-warning {{NO}}
 
   Aggr A = mySource1<Aggr>();
-  clang_analyzer_isTainted(A);      // expected-warning {{YES}}
+  // FIXME Ideally, both A and A.data should be tainted. However, the
+  //       implementation used by e5ac9145ba29 ([analyzer][taint] Recognize
+  //       tainted LazyCompoundVals (4/4) (#115919), 2024-11-15) led to FPs and
+  //       FNs in various scenarios and had to be reverted to fix #153782.
+  clang_analyzer_isTainted(A);      // expected-warning {{NO}}
   clang_analyzer_isTainted(A.data); // expected-warning {{YES}}
 }
 } // namespace gh114270
diff --git a/clang/test/Analysis/template-param-objects.cpp b/clang/test/Analysis/template-param-objects.cpp
index b065f8756d4d8..dde95fa62cb65 100644
--- a/clang/test/Analysis/template-param-objects.cpp
+++ b/clang/test/Analysis/template-param-objects.cpp
@@ -11,7 +11,7 @@ bool operator ==(Box lhs, Box rhs) {
   return lhs.value == rhs.value;
 }
 template <Box V> void dumps() {
-  clang_analyzer_dump(V);        // expected-warning {{Unknown}}
+  clang_analyzer_dump(V);        // expected-warning {{lazyCompoundVal}}
   clang_analyzer_dump(&V);       // expected-warning {{Unknown}}
   clang_analyzer_dump(V.value);  // expected-warning {{Unknown}} FIXME: It should be '6 S32b'.
   clang_analyzer_dump(&V.value); // expected-warning {{Unknown}}

@mantognini mantognini requested a review from steakhal October 14, 2025 21:54
@steakhal
Copy link
Contributor

The revert itself looks correct. I hate the fact that we need to revert these.

The problem essentially boils down to the following:
We have some object, and we refer to a field of it.
Then we copy the object and refer to the field of the copy; and reading from that field should result in the same value as if the read happened on the original field.
This assumption broke with the patches that now gets reverted.

For example:

namespace gh153782 {
struct Status { int error; };
Status getError();

Status setPtr(int **outptr, int* ptr) {
  Status e = getError();
  if (e.error != 0) return e; // When assuming the error field is non-zero,
  *outptr = ptr;              // this is not executed
  return e;
}

int func() {
  int *ptr = nullptr;
  int x = 42;
  if (setPtr(&ptr, &x).error == 0) {
    clang_analyzer_dump_val(ptr); // expected-warning {{&x}}
    return *ptr;
  }
  return 0;
}

} // namespace gh153782

In this example, withing setPtr we constrain e.error, then we return e and copy it into the temporary at the callsite.
So the setPtr(&ptr, &x).error should return the same symbol that was previously constrained.

Why is it different then?
The value of e was default binding of conjured symbol from getError(), so e.error was a slice of that one, represented by SymbolDerived{conj, e.error}.
The return value of setPtr() is a temporary, thus represented by CXXTempObjectRegion. That is the region the call return should copy into in the return e; statement. Previously, this was represented by a LazyCompoundVal{e}. With the LCV optimizations, this was simplified into conj directly.

Consequently, when later the error field was accessed on the temporary, then SymbolDerived{conj, temp_object{}.error} was created Notice that the MemRegion is different!, so the Profile() of these will be different, so we have different derived symbols...

In theory, both e.error and temp_object{}.error should boil down into the same begin offset and extent (in other words bit sequence) of the same conjured symbol. So these should mean the same thing!

Changing SymbolDerived by replacing the MemRegion with an offset-extent pair, would be challenging.

How did this work with LCVs before?
It seems like in clang-19, the LCV{e} was desugared into SymbolDerived{conj, e.error}. I suppose it was possible to do because LCVs were handled specially, so when loading from the temp_object it didn't create a SymbolDerived{conj, temp_object.error}, but SymbolDerived{conj, e.error} instead because it knew that the LCV it was traversing was for e.
After the LCV optimization, we no longer have the origin region e, so this custom handling of LCVs didn't kick in, so we end up with a brand new SymbolDerived. I have not checked this theory, so treat it with a pinch of salt.

I'm pretty sure we could fix this, but I do not have the time right now. And to be honest, I don't think I'd invest here even in the future. So for now, let's just do the revert :|


Some extra notes about the other regression tests on how they are similar to the case before:

namespace mutually_exclusive_test_case_3 {
struct StorageWrapper {
  // Imagine those two call a reset() function (among other things)
  ~StorageWrapper() { delete parts; }
  StorageWrapper(StorageWrapper const&) = default;

  // Mind that there is no assignment here -- this is the bug we would like to find.
  void operator=(StorageWrapper&&) { delete parts; } // newdelete-warning{{Attempt to release already released memory}}

  // Not provided, typically would do `parts = new long`.
  StorageWrapper();

  long* parts;
};

struct TestDoubleFreeWithInitializerList {
  StorageWrapper* Object;
  TestDoubleFreeWithInitializerList()
  : Object(new StorageWrapper[]{StorageWrapper()}) {
    Object[0] = StorageWrapper(); // This assignment leads to the double-free.
  }
};
} // mutually_exclusive_test_case_3

} // namespace gh153782

In the new StorageWrapper[]{StorageWrapper()} we create a temporary that we copy into the heap allocated array, then the temporary dies and its dtor deletes its parts pointer. Consequently, the copy on the heap will have a dangling pointer.
Then Object[0] = StorageWrapper(); is just a fancy way of spelling delete Object[0].parts, which would be the second time that the same parts pointer is deleted, leading to the double free.
The point is, that we end up with different symbols for the parts subobject at the two delete expressions:
derived_$8 {conj_$7,temp_object{struct StorageWrapper, S2121}.parts} and
derived_$12{conj_$7,Element{HeapSymRegion{conj_$3},0 S64b,struct StorageWrapper}.parts}
The idea is the same.

@steakhal steakhal requested review from NagyDonat and haoNoQ October 15, 2025 13:09
@steakhal
Copy link
Contributor

@haoNoQ I'd be interested in your opinion. I tried to sum up my investigation, but feel free to ask questions.

@marco-antognini-sonarsource
Copy link
Contributor Author

gentle ping :-)

@haoNoQ / @NagyDonat happy to hear your feedback if you have any. Otherwise, I'd like to move forward with this PR if that's alright.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two minor remarks about comments that were a bit unclear for me; the commit LGTM if you handle those. I like the detailed tests and I feel that they convincingly demonstrate that this revert is necessary, but I admit that I don't have a deep understanding of the situation.

My vague feeling is that the now-reverted commits were reasonable and it's surprising and unfortunate that they caused problems.

In general I think that these aspects of our memory model are burdened by lots of technical debt (presumably because they were developed through gradual evolution). I would say that in an ideal world the behavior of the analyzer should only depend on the actual data (what it knows about the memory contents) and not our way of representing it (directly copied fields vs LazyCompoundVal).

I hope that eventually I will be able to spend some time to get a deep understanding of our memory model and symbol handling (including the RegionStore++ prototype) and perhaps then I'll be able to suggest or develop some improvements in this area.

Until then, this revert is the correct move.

Edit: also, sorry for not reviewing this earlier!

~StorageWrapper() { delete parts; }
StorageWrapper(StorageWrapper const&) = default;

// Mind that there is no assignment here -- this is the bug we would like to find.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a moment I was surprised to see this comment directly above the definition of a (move) assignment operator -- perhaps consider writing "copy assignment" to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise it was not obvious at all -- I've rephrase it.

The intend was to say, there is no parts = other.parts; after the delete.

Let me know if this is clearer now after 904f919


namespace mutually_exclusive_test_case_1 {
struct StorageWrapper {
// Imagine those two call a reset() function (among other things)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "those two" refer to in this comment? (Perhaps it's my fault but I have no idea.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified this comment (and its copies) to refer explicitly to the destructor and copy constructor.

@haoNoQ
Copy link
Collaborator

haoNoQ commented Oct 27, 2025

Ahhh good catch.

Yeah right now SymbolDerived treats the conjured symbol inside it as the timestamp of the devastating event that caused the loss of information about the program state. So the region needs to do all the heavy lifting of figuring out which part of the lost information is being pulled. In this sense SymbolRegionValue is effectively a SymbolDerived that corresponds to the devastating event of starting the analysis from an arbitrary function in the middle of the program. So right now it doesn't make sense to "transplant" the conjured symbol because the recepient region wasn't actually devastated by that event. And the connection between the event and the affected region is currently completely implicit. (See also the funny symbol dumps in https://godbolt.org/z/jovxGYo7z)

I completely agree that all the heavy lifting needs to be performed by the conjured symbol instead, and the derived symbol should just be an offset into it. Then the transplant would make a lot more sense.

It's slightly different with global variables and heap regions where the conjured symbol covers the entire memory space. Eg. the variable itself acts as the offset with respect to the entire space of global variables. So in this case the derived symbol needs to carry at least some region information. But in that case you probably don't want to transplant the conjured symbol across memory spaces in the first place.

That said, I'm not sure where I stand on the overall idea of transplanting symbols to save on indirections. Are indirections that annoying/expensive to unwrap? It's not like lazy compound values are expensive to create. It's very annoying in terms of checker API but this could probably solved by simply providing convenient API? I need to read more about your previous patches to catch up on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to both of you. Now, I'll admit I'm not equipped to fully appreciate your comments and won't be able to help much in fixing the technical debt and get a better fix... Just managing expectations here. 🙃


namespace mutually_exclusive_test_case_1 {
struct StorageWrapper {
// Imagine those two call a reset() function (among other things)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified this comment (and its copies) to refer explicitly to the destructor and copy constructor.

~StorageWrapper() { delete parts; }
StorageWrapper(StorageWrapper const&) = default;

// Mind that there is no assignment here -- this is the bug we would like to find.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise it was not obvious at all -- I've rephrase it.

The intend was to say, there is no parts = other.parts; after the delete.

Let me know if this is clearer now after 904f919

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this.

@mantognini mantognini merged commit cc3ad20 into llvm:main Nov 4, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:static analyzer clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[analyzer] False Positives due to conflicting assumptions in callee and caller about returned value's field

6 participants