Skip to content

Conversation

@boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Nov 4, 2025

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.

This helps to clean up any dead stores that come up at the end of the
destructor. The motivating example was a refactoring in libc++'s
basic_string implementation in 8dae17b
that added a zeroing store into the destructor, causing a large
performance regression on an internal workload.
@boomanaiden154 boomanaiden154 force-pushed the clang-destructor-dead-on-return branch from 7fb6339 to 7a3dec4 Compare December 1, 2025 16:11
@boomanaiden154 boomanaiden154 marked this pull request as ready for review December 1, 2025 17:57
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Dec 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Aiden Grossman (boomanaiden154)

Changes

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.


Patch is 5.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/166276.diff

109 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+11-1)
  • (modified) clang/test/CodeGen/paren-list-agg-init.cpp (+4-4)
  • (modified) clang/test/CodeGen/temporary-lifetime.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/amdgcn-func-arg.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/for-range.cpp (+12-12)
  • (modified) clang/test/CodeGenCXX/gh62818.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/nrvo.cpp (+116-116)
  • (modified) clang/test/CodeGenCXX/pr13396.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ptrauth-apple-kext-indirect-virtual-dtor-call.cpp (+3-3)
  • (modified) clang/test/CodeGenObjCXX/objc-struct-cxx-abi.mm (+3-3)
  • (modified) clang/test/CodeGenObjCXX/ptrauth-struct-cxx-abi.mm (+1-1)
  • (modified) clang/test/DebugInfo/CXX/bpf-structors.cpp (+1-1)
  • (modified) clang/test/DebugInfo/CXX/trivial_abi.cpp (+1-1)
  • (modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (+3-3)
  • (modified) clang/test/OpenMP/distribute_firstprivate_codegen.cpp (+141-141)
  • (modified) clang/test/OpenMP/distribute_lastprivate_codegen.cpp (+145-145)
  • (modified) clang/test/OpenMP/distribute_parallel_for_firstprivate_codegen.cpp (+188-188)
  • (modified) clang/test/OpenMP/distribute_parallel_for_lastprivate_codegen.cpp (+211-211)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+24-24)
  • (modified) clang/test/OpenMP/distribute_parallel_for_private_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_firstprivate_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp (+510-510)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_private_codegen.cpp (+132-132)
  • (modified) clang/test/OpenMP/distribute_private_codegen.cpp (+64-64)
  • (modified) clang/test/OpenMP/distribute_simd_firstprivate_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/distribute_simd_private_codegen.cpp (+108-108)
  • (modified) clang/test/OpenMP/for_firstprivate_codegen.cpp (+40-40)
  • (modified) clang/test/OpenMP/for_lastprivate_codegen.cpp (+202-202)
  • (modified) clang/test/OpenMP/for_linear_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/for_private_codegen.cpp (+33-33)
  • (modified) clang/test/OpenMP/for_reduction_codegen.cpp (+647-311)
  • (modified) clang/test/OpenMP/master_taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/master_taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/master_taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/parallel_copyin_codegen.cpp (+70-70)
  • (modified) clang/test/OpenMP/parallel_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/parallel_for_linear_codegen.cpp (+13-13)
  • (modified) clang/test/OpenMP/parallel_master_codegen.cpp (+8-8)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_firstprivate_codegen.cpp (+200-200)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_lastprivate_codegen.cpp (+231-231)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_firstprivate_codegen.cpp (+213-213)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_lastprivate_codegen.cpp (+306-306)
  • (modified) clang/test/OpenMP/parallel_private_codegen.cpp (+49-49)
  • (modified) clang/test/OpenMP/parallel_reduction_codegen.cpp (+113-113)
  • (modified) clang/test/OpenMP/scope_codegen.cpp (+358-358)
  • (modified) clang/test/OpenMP/sections_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/sections_lastprivate_codegen.cpp (+83-83)
  • (modified) clang/test/OpenMP/sections_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/sections_reduction_codegen.cpp (+43-43)
  • (modified) clang/test/OpenMP/simd_private_taskloop_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/single_codegen.cpp (+455-455)
  • (modified) clang/test/OpenMP/single_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/single_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/target_has_device_addr_codegen.cpp (+65-65)
  • (modified) clang/test/OpenMP/target_in_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen-1.cpp (+84-84)
  • (modified) clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/target_teams_distribute_lastprivate_codegen.cpp (+118-118)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp (+186-186)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_lastprivate_codegen.cpp (+172-172)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_private_codegen.cpp (+150-150)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+216-216)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_private_codegen.cpp (+194-194)
  • (modified) clang/test/OpenMP/target_teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_private_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/task_codegen.cpp (+3057-1972)
  • (modified) clang/test/OpenMP/task_in_reduction_codegen.cpp (+57-57)
  • (modified) clang/test/OpenMP/taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/teams_distribute_lastprivate_codegen.cpp (+139-139)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_firstprivate_codegen.cpp (+100-100)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_lastprivate_codegen.cpp (+205-205)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_codegen.cpp (+14-14)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_private_codegen.cpp (+82-82)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_codegen.cpp (+252-252)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_private_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/teams_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/teams_generic_loop_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_private_codegen.cpp (+80-80)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+392-392)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected (+13-13)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/explicit-template-instantiation.cpp.expected (+3-3)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index efacb3cc04c01..ee6e13fd1c1a5 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2767,7 +2767,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   }
 
   // Apply `nonnull`, `dereferenceable(N)` and `align N` to the `this` argument,
-  // unless this is a thunk function.
+  // unless this is a thunk function. Add dead_on_return to the `this` argument
+  // in base class destructors to aid in DSE.
   // FIXME: fix this properly, https://reviews.llvm.org/D100388
   if (FI.isInstanceMethod() && !IRFunctionArgs.hasInallocaArg() &&
       !FI.arg_begin()->type->isVoidPointerType() && !IsThunk) {
@@ -2800,6 +2801,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
             .getAsAlign();
     Attrs.addAlignmentAttr(Alignment);
 
+    if (isa_and_nonnull<CXXDestructorDecl>(
+            CalleeInfo.getCalleeDecl().getDecl())) {
+      auto *ClassDecl = dyn_cast<CXXRecordDecl>(
+          CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
+      if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
+        Attrs.addAttribute(llvm::Attribute::DeadOnReturn);
+      }
+    }
+
     ArgAttrs[IRArgs.first] = llvm::AttributeSet::get(getLLVMContext(), Attrs);
   }
 
diff --git a/clang/test/CodeGen/paren-list-agg-init.cpp b/clang/test/CodeGen/paren-list-agg-init.cpp
index e30777ecc07d6..561bf2b5eb9c4 100644
--- a/clang/test/CodeGen/paren-list-agg-init.cpp
+++ b/clang/test/CodeGen/paren-list-agg-init.cpp
@@ -394,9 +394,9 @@ namespace gh61145 {
   // a.k.a. Vec::Vec(Vec&&)
   // CHECK-NEXT: call void @_ZN7gh611453VecC1EOS0_(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]], ptr noundef nonnull align 1 dereferenceable(1) [[V]])
   // a.k.a. S1::~S1()
-  // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
+  // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
   // a.k.a.Vec::~Vec()
-  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
   // CHECK-NEXT: ret void
   template <int I>
   void make1() {
@@ -416,9 +416,9 @@ namespace gh61145 {
   // CHECK-NEXT: [[C:%.*c.*]] = getelementptr inbounds nuw [[STRUCT_S2]], ptr [[AGG_TMP_ENSURED]], i32 0, i32
   // CHECK-NEXT: store i8 0, ptr [[C]], align 1
   // a.k.a. S2::~S2()
-  // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
+  // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
   // a.k.a. Vec::~Vec()
-  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
   // CHECK-NEXT: ret void
   template <int I>
   void make2() {
diff --git a/clang/test/CodeGen/temporary-lifetime.cpp b/clang/test/CodeGen/temporary-lifetime.cpp
index 04087292b2c70..44d1235f15c86 100644
--- a/clang/test/CodeGen/temporary-lifetime.cpp
+++ b/clang/test/CodeGen/temporary-lifetime.cpp
@@ -24,12 +24,12 @@ void Test1() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: }
 
@@ -61,9 +61,9 @@ void Test2() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR2:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR2:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR2]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR2]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR2]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR1]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR1]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR1]])
   // CHECK-DTOR: }
 
@@ -155,12 +155,12 @@ void Test7() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: }
   Foo(Baz<A>());
diff --git a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
index 3c2a624bd4f95..e05f8133321c7 100644
--- a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
@@ -75,7 +75,7 @@ int x;
 // CHECK-NEXT:    [[A:%.*]] = alloca [[CLASS_A:%.*]], align 4, addrspace(5)
 // CHECK-NEXT:    [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
 // CHECK-NEXT:    ret void
 //
 void func3() {
diff --git a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
index a5f83dc91b038..bc20c33ec4d0f 100644
--- a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
@@ -43,9 +43,9 @@ void func_with_indirect_arg(A a) {
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[A_ASCAST]], i64 4, i1 false)
 // CHECK-NEXT:    [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
 // CHECK-NEXT:    call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
 // CHECK-NEXT:    call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
 // CHECK-NEXT:    ret void
 //
 void test_indirect_arg_auto() {
@@ -61,7 +61,7 @@ void test_indirect_arg_auto() {
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_a to ptr), i64 4, i1 false)
 // CHECK-NEXT:    [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
 // CHECK-NEXT:    call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
 // CHECK-NEXT:    call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) addrspacecast (ptr addrspace(1) @g_a to ptr))
 // CHECK-NEXT:    ret void
 //
diff --git a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
index 4eafa720e0cb4..a764ba31539eb 100644
--- a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
+++ b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
@@ -217,7 +217,7 @@ void ArrayInit() {
   // CHECK:       [[ARRAY_DESTROY_BODY2]]:
   // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %1, %cleanup ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY2]] ]
   // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
   // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr
   // CHECK-NEXT:    br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE2]], label %[[ARRAY_DESTROY_BODY2]]
 
@@ -265,7 +265,7 @@ void ArraySubobjects() {
     // CHECK:       [[ARRAY_DESTROY_BODY]]:
     // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %0, %if.then ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY]] ]
     // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
     // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr2
     // CHECK-NEXT:    br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE]], label %[[ARRAY_DESTROY_BODY]]
 
@@ -277,7 +277,7 @@ void ArraySubobjects() {
     // CHECK:       [[ARRAY_DESTROY_BODY2]]:
     // CHECK-NEXT:    %arraydestroy.elementPast4 = phi ptr [ %1, %[[ARRAY_DESTROY_DONE]] ], [ %arraydestroy.element5, %[[ARRAY_DESTROY_BODY2]] ]
     // CHECK-NEXT:    %arraydestroy.element5 = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast4, i64 -1
-    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
+    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
     // CHECK-NEXT:    %arraydestroy.done6 = icmp eq ptr %arraydestroy.element5, [[ARRAY_BEGIN]]
     // CHECK-NEXT:    br i1 %arraydestroy.done6, label %[[ARRAY_DESTROY_DONE2:.+]], label %[[ARRAY_DESTROY_BODY2]]
 
@@ -384,7 +384,7 @@ void NewArrayInit() {
   // CHECK:       arraydestroy.body:
   // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %{{.*}}, %if.then ], [ %arraydestroy.element, %arraydestroy.body ]
   // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
   // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %0
   // CHECK-NEXT:    br i1 %arraydestroy.done, label %arraydestroy.done{{.*}}, label %arraydestroy.body
 
diff --git a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
index 24b1a4dd42977..af29120feb5bb 100644
--- a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -41,11 +41,11 @@ void glob_delete_A(A *a) { ::delete a; }
 // CHECK: icmp eq ptr %[[a]], null
 // CHECK: br i1
 
-// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
 // CHECK-ITANIUM-NEXT: call void @_ZdlPvm(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
 // CHECK-MSABI64-NEXT: call void @"??3@YAXPEAX_K@Z"(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr noundef nonnull align 4 dereferenceable(4) %[[a]])
+// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) %[[a]])
 // CHECK-MSABI32-NEXT: call void @"??3@YAXPAXI@Z"(ptr noundef %[[a]], i32 noundef 4)
 
 struct B {
diff --git a/clang/test/CodeGenCXX/for-range.cpp b/clang/test/CodeGenCXX/for-range.cpp
index 088a34647c374..b9706855f658c 100644
--- a/clang/test/CodeGenCXX/for-range.cpp
+++ b/clang/test/CodeGenCXX/for-range.cpp
@@ -53,7 +53,7 @@ extern B array[5];
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP2]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -61,7 +61,7 @@ extern B array[5];
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_array() {
@@ -81,10 +81,10 @@ void for_array() {
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
 // CHECK-NEXT:    call void @_ZN1CC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
 // CHECK-NEXT:    store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2:![0-9]+]]
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef ptr @_Z5beginR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
 // CHECK-NEXT:    store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef ptr @_Z3endR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
 // CHECK-NEXT:    store ptr [[CALL1]], ptr [[__END1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND:%.*]]
@@ -94,12 +94,12 @@ void for_array() {
 // CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 // CHECK:       for.cond.cleanup:
-// CHECK-NEXT:    call void @_ZN1CD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1CD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_END:%.*]]
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -107,7 +107,7 @@ void for_array() {
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_range() {
@@ -127,10 +127,10 @@ void for_range() {
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
 // CHECK-NEXT:    call void @_ZN1DC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
 // CHECK-NEXT:    store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef ptr @_ZN1D5beginEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
 // CHECK-NEXT:    store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef ptr @_ZN1D3endEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
 // CHECK-NEXT:    store ptr [[CALL1]], ptr [[__END1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND:%.*]]
@@ -140,12 +140,12 @@ void for_range() {
 // CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 // CHECK:       for.cond.cleanup:
-// CHECK-NEXT:    call void @_ZN1DD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1DD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_END:%.*]]
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -153,7 +153,7 @@ void for_range() {
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_member_range() {
diff --git a/clang/test/CodeGenCXX/gh62818.cpp b/clang/test/CodeGenCXX/gh62818.cpp
index ec91b40fca077..f903679cd6b68 100644
--- a/clang/test/CodeGenCXX/gh62818.cpp
+++ b/clang/test/CodeGenCXX/gh62818.c...
[truncated]

@boomanaiden154
Copy link
Contributor Author

The actual change is in the first commit (7a3dec4). I've separated the test changes out into 6ec0952 to hopefully make review a bit easier.

CC @philnik777 Who came up with the idea and requested this.

@rnk
Copy link
Collaborator

rnk commented Dec 1, 2025

This optimization exploits the fact that it's undefined behavior to read from an object after its been destroyed. Given the overall shift in how the industry feels about compilers exploiting undefined behavior, I want to push to add an flag to control this. Think of the people who use -fno-delete-null-pointer-checks. The kinds of people who use that are going to want to disable this kind of optimization. This optimization should absolutely be on-by-default, we'd just have a way to opt out, mentioned in release notes, etc etc.

I'd also like to better understand why base classes matter for this annotation. Until very recently, basic_string used a bunch of compressed pair empty bases instead of [[no_unique_address]], so adding a base class might create a surprising performance regression with the change as written.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, we should do it, this is a valuable optimization. (Commenting twice to push out the inline comments).

.getAsAlign();
Attrs.addAlignmentAttr(Alignment);

if (isa_and_nonnull<CXXDestructorDecl>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code golf: You can CSE the CalleeInfo.getCalleeDecl().getDecl() if you use dyn_cast_or_nonnull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Ended up using dyn_cast_if_present.

CalleeInfo.getCalleeDecl().getDecl())) {
auto *ClassDecl = dyn_cast<CXXRecordDecl>(
CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to limit this to only destructors of classes with no bases? Whatever the reason (caution, incremental change, etc), comments here would be appreciated.

Copy link
Contributor

@rjmccall rjmccall Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual base subobjects aren't dead after a base subobject destructor call, but yeah, I can't think of a reason to limit this because of non-virtual bases alone. And even virtual bases are dead after a complete-object destructor call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, just to be conservative while we get implementation experience. I've added a TODO for myself to remove this after we do more testing.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

@boomanaiden154
Copy link
Contributor Author

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to __sanitizer_dtor_callback_fields at the end of the destructor, passing in the this pointer along with the object size. __sanitizer_dtor_callback_fields then modifies the shadow memory, which is a separate object that is not marked dead by this change.

CC @thurstond Who is more familiar with the internals of msan and would be able to confirm.

@boomanaiden154
Copy link
Contributor Author

boomanaiden154 commented Dec 2, 2025

I'd also like to better understand why base classes matter for this annotation. Until very recently, basic_string used a bunch of compressed pair empty bases instead of [[no_unique_address]], so adding a base class might create a surprising performance regression with the change as written.

I was mostly concerned about cases like https://godbolt.org/z/fq4fWfqKv, thinking that we might eliminate stores that would otherwise propagate into the base destructors. I didn't realize that the invocation of the more-base destructor just happens as a call at the end of the derived class destructor, so is a non-issue.

Besides that, I wouldn't mind being conservative initially and making sure that everything ships internally as smoothly as the initial testing that I've done before making this optimization more broad. It doesn't seem like there would be large fallout, but it doesn't seem like there's a downside to doing things incrementally.

@rjmccall
Copy link
Contributor

rjmccall commented Dec 2, 2025

You're correct to be conservative about virtual bases, because they will generally still be live out of the base subobject destructor. I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

@llvmbot llvmbot added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 2, 2025
@nikic
Copy link
Contributor

nikic commented Dec 2, 2025

GCC has a related -fno-lifetime-dse option, possibly we should be integrating with that flag instead of creating a new one? (Notably, LLVM sets that flag because we have some hacks in the User implementation that rely on accessing the object in overloaded operator delete, see #24952).

@thurstond
Copy link
Contributor

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to __sanitizer_dtor_callback_fields at the end of the destructor, passing in the this pointer along with the object size.

This is my understanding as well.

__sanitizer_dtor_callback_fields then modifies the shadow memory, which is a separate object that is not marked dead by this change.

Yep, shadow memory is opaque to the clang backend - the shadow is just a big blob of memory (~16TB on x86-64), not a bunch of objects.

@boomanaiden154
Copy link
Contributor Author

You're correct to be conservative about virtual bases, because they will generally still be live out of the base subobject destructor. I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

Thanks for confirming. I've constructed https://godbolt.org/z/nKPfhPxsx which I believe demonstrates the problem there (we end up destroying foobar before foo or bar in the whole object destructor, so marking stores to this inside of foobar would change the behavior).

Given you said this is in general, it sounds like there are cases where virtual bases are not live out of the base subobject destructor?

@boomanaiden154
Copy link
Contributor Author

boomanaiden154 commented Dec 2, 2025

GCC has a related -fno-lifetime-dse option, possibly we should be integrating with that flag instead of creating a new one? (Notably, LLVM sets that flag because we have some hacks in the User implementation that rely on accessing the object in overloaded operator delete, see #24952).

Reusing the existing flag name sounds reasonable enough to me, but in #40040 it seems like people had opinions on the flag naming, with @zygoloid suggesting -fno-strict-lifetimes for the opt-out. It looks like a dummy implementation of the flag was proposed in https://reviews.llvm.org/D150930, but never landed. I know in the past we have tried to go for gcc compatibility, so I'm wondering what opinions here are. I think -fstrict-lifetimes makes a lot more sense to users, but it might be good to support the gcc naming. I think it should be pretty simple to support both and just alias them.

Regarding #24952, I guess I did not end up hitting that one because of the intentional conservativeness around base classes (I also didn't do a bootstrapping build with LTO).

@boomanaiden154
Copy link
Contributor Author

I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

I've left this as a TODO, mainly so I can make sure that our internal release process doesn't catch many more issues than how I tested this. Then I plan on doing another round of testing without the check on non-virtual bases and then removing the TODO. We don't need more selective flags than this and I don't see why other users couldn't just be pointed at an opt-out flag. Is this strategy okay or would you like to see the non-virtual base check removed before landing?

@rjmccall
Copy link
Contributor

rjmccall commented Dec 2, 2025

Given you said this is in general, it sounds like there are cases where virtual bases are not live out of the base subobject destructor?

Just sloppy wording, sorry. Virtual bases are always live out of the base subobject destructor.

@rjmccall
Copy link
Contributor

rjmccall commented Dec 2, 2025

I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

I've left this as a TODO, mainly so I can make sure that our internal release process doesn't catch many more issues than how I tested this. Then I plan on doing another round of testing without the check on non-virtual bases and then removing the TODO. We don't need more selective flags than this and I don't see why other users couldn't just be pointed at an opt-out flag. Is this strategy okay or would you like to see the non-virtual base check removed before landing?

I know there's been feedback before from Google folks about having a flag whenever we expand an optimization so that they can more easily check whether the expansion specifically caused a particular regression.

@boomanaiden154
Copy link
Contributor Author

Just sloppy wording, sorry. Virtual bases are always live out of the base subobject destructor.

Ack.

I know there's been feedback before from Google folks about having a flag whenever we expand an optimization so that they can more easily check whether the expansion specifically caused a particular regression.

I'm the person at Google currently driving this. I think we should be fine with a single flag and would probably push back against the introduction of a second flag if we agree to postpone the enablement of adding the attribute for classes with a non-zero number of bases.

@boomanaiden154
Copy link
Contributor Author

boomanaiden154 commented Dec 3, 2025

I've updated this patch to use -flifetime-dse/-fno-lifetime-dse as the flag to match gcc. I still think -fstrict-lifetimes makes a lot more sense as a user-exposed flag.

Adding @MaskRay and @zygoloid in case they have opinions on the flag naming/driver changes.

@zygoloid
Copy link
Collaborator

zygoloid commented Dec 3, 2025

Is it correct to mark destructors' this pointers as dead_on_return? What about this case:

struct X {
  int n;
  ~X() {
    this[n].n = 0;
  }
};
void f() {
  X xs[] = {42, -1};
}

I think that's valid -- you can use a pointer to an array element to reach other elements of the array, even if the pointer is the this pointer -- but if the this[n].n = 0 store is treated as dead, then destroying xs[1] won't set xs[0].n to 0, so destroying xs[0] will store outside the array.

I think we'd need to add a size to the dead_on_return attribute to indicate how many bytes are dead; we can't assume all memory reachable through this is dead.

@antoniofrighetto
Copy link
Contributor

I think that's valid -- you can use a pointer to an array element to reach other elements of the array, even if the pointer is the this pointer -- but if the this[n].n = 0 store is treated as dead, then destroying xs[1] won't set xs[0].n to 0, so destroying xs[0] will store outside the array.

I think we'd need to add a size to the dead_on_return attribute to indicate how many bytes are dead; we can't assume all memory reachable through this is dead.

Agree we would be introducing a OOB write on the second destructor invocation if we were to DSE this[n].n = 0. I wonder if we could support base subobject destructors as well, and what size would pass to dead_on_return in this case (I suppose it'd be sizeof(Base)).

@boomanaiden154
Copy link
Contributor Author

Yeah, seems like a valid issue to me, although I'm not a language lawyer. I tried going through the C++ standard last night because it really seems like something that could be undefined behavior (using this to access memory outside of the current object), but I could not find anything (or just do not know where to look).

Assuming there's no disagreement, I'll probably start putting up a couple patches to make dead_on_return sized soonish.

@philnik777
Copy link
Contributor

I don't have much constructive to say here, since I'm not very familiar with the core language part of the standard, nor an expert in Clang or LLVM, but thanks for taking a stab at this @boomanaiden154! It'd be awesome to see this land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants