-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang] Mark this pointer in destructors dead_on_return #166276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Clang] Mark this pointer in destructors dead_on_return #166276
Conversation
This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload.
7fb6339 to
7a3dec4
Compare
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Aiden Grossman (boomanaiden154) ChangesThis helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this. I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code. Patch is 5.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/166276.diff 109 Files Affected:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index efacb3cc04c01..ee6e13fd1c1a5 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2767,7 +2767,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
}
// Apply `nonnull`, `dereferenceable(N)` and `align N` to the `this` argument,
- // unless this is a thunk function.
+ // unless this is a thunk function. Add dead_on_return to the `this` argument
+ // in base class destructors to aid in DSE.
// FIXME: fix this properly, https://reviews.llvm.org/D100388
if (FI.isInstanceMethod() && !IRFunctionArgs.hasInallocaArg() &&
!FI.arg_begin()->type->isVoidPointerType() && !IsThunk) {
@@ -2800,6 +2801,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
.getAsAlign();
Attrs.addAlignmentAttr(Alignment);
+ if (isa_and_nonnull<CXXDestructorDecl>(
+ CalleeInfo.getCalleeDecl().getDecl())) {
+ auto *ClassDecl = dyn_cast<CXXRecordDecl>(
+ CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
+ if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
+ Attrs.addAttribute(llvm::Attribute::DeadOnReturn);
+ }
+ }
+
ArgAttrs[IRArgs.first] = llvm::AttributeSet::get(getLLVMContext(), Attrs);
}
diff --git a/clang/test/CodeGen/paren-list-agg-init.cpp b/clang/test/CodeGen/paren-list-agg-init.cpp
index e30777ecc07d6..561bf2b5eb9c4 100644
--- a/clang/test/CodeGen/paren-list-agg-init.cpp
+++ b/clang/test/CodeGen/paren-list-agg-init.cpp
@@ -394,9 +394,9 @@ namespace gh61145 {
// a.k.a. Vec::Vec(Vec&&)
// CHECK-NEXT: call void @_ZN7gh611453VecC1EOS0_(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]], ptr noundef nonnull align 1 dereferenceable(1) [[V]])
// a.k.a. S1::~S1()
- // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
+ // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
// a.k.a.Vec::~Vec()
- // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+ // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
// CHECK-NEXT: ret void
template <int I>
void make1() {
@@ -416,9 +416,9 @@ namespace gh61145 {
// CHECK-NEXT: [[C:%.*c.*]] = getelementptr inbounds nuw [[STRUCT_S2]], ptr [[AGG_TMP_ENSURED]], i32 0, i32
// CHECK-NEXT: store i8 0, ptr [[C]], align 1
// a.k.a. S2::~S2()
- // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
+ // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
// a.k.a. Vec::~Vec()
- // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+ // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
// CHECK-NEXT: ret void
template <int I>
void make2() {
diff --git a/clang/test/CodeGen/temporary-lifetime.cpp b/clang/test/CodeGen/temporary-lifetime.cpp
index 04087292b2c70..44d1235f15c86 100644
--- a/clang/test/CodeGen/temporary-lifetime.cpp
+++ b/clang/test/CodeGen/temporary-lifetime.cpp
@@ -24,12 +24,12 @@ void Test1() {
// CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
// CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
// CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
// CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
// CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
// CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
// CHECK-DTOR: }
@@ -61,9 +61,9 @@ void Test2() {
// CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR2:.+]])
// CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR2:[^ ]+]])
// CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR2]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR2]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR2]])
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR1]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR1]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR1]])
// CHECK-DTOR: }
@@ -155,12 +155,12 @@ void Test7() {
// CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
// CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
// CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
// CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
// CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
// CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
- // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+ // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
// CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
// CHECK-DTOR: }
Foo(Baz<A>());
diff --git a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
index 3c2a624bd4f95..e05f8133321c7 100644
--- a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
@@ -75,7 +75,7 @@ int x;
// CHECK-NEXT: [[A:%.*]] = alloca [[CLASS_A:%.*]], align 4, addrspace(5)
// CHECK-NEXT: [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
// CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
// CHECK-NEXT: ret void
//
void func3() {
diff --git a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
index a5f83dc91b038..bc20c33ec4d0f 100644
--- a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
@@ -43,9 +43,9 @@ void func_with_indirect_arg(A a) {
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[A_ASCAST]], i64 4, i1 false)
// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
// CHECK-NEXT: call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
// CHECK-NEXT: call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
// CHECK-NEXT: ret void
//
void test_indirect_arg_auto() {
@@ -61,7 +61,7 @@ void test_indirect_arg_auto() {
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_a to ptr), i64 4, i1 false)
// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
// CHECK-NEXT: call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
// CHECK-NEXT: call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) addrspacecast (ptr addrspace(1) @g_a to ptr))
// CHECK-NEXT: ret void
//
diff --git a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
index 4eafa720e0cb4..a764ba31539eb 100644
--- a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
+++ b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
@@ -217,7 +217,7 @@ void ArrayInit() {
// CHECK: [[ARRAY_DESTROY_BODY2]]:
// CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %1, %cleanup ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY2]] ]
// CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
- // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+ // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
// CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr
// CHECK-NEXT: br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE2]], label %[[ARRAY_DESTROY_BODY2]]
@@ -265,7 +265,7 @@ void ArraySubobjects() {
// CHECK: [[ARRAY_DESTROY_BODY]]:
// CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %0, %if.then ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY]] ]
// CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
- // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+ // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
// CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr2
// CHECK-NEXT: br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE]], label %[[ARRAY_DESTROY_BODY]]
@@ -277,7 +277,7 @@ void ArraySubobjects() {
// CHECK: [[ARRAY_DESTROY_BODY2]]:
// CHECK-NEXT: %arraydestroy.elementPast4 = phi ptr [ %1, %[[ARRAY_DESTROY_DONE]] ], [ %arraydestroy.element5, %[[ARRAY_DESTROY_BODY2]] ]
// CHECK-NEXT: %arraydestroy.element5 = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast4, i64 -1
- // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
+ // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
// CHECK-NEXT: %arraydestroy.done6 = icmp eq ptr %arraydestroy.element5, [[ARRAY_BEGIN]]
// CHECK-NEXT: br i1 %arraydestroy.done6, label %[[ARRAY_DESTROY_DONE2:.+]], label %[[ARRAY_DESTROY_BODY2]]
@@ -384,7 +384,7 @@ void NewArrayInit() {
// CHECK: arraydestroy.body:
// CHECK-NEXT: %arraydestroy.elementPast = phi ptr [ %{{.*}}, %if.then ], [ %arraydestroy.element, %arraydestroy.body ]
// CHECK-NEXT: %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
- // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+ // CHECK-NEXT: call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
// CHECK-NEXT: %arraydestroy.done = icmp eq ptr %arraydestroy.element, %0
// CHECK-NEXT: br i1 %arraydestroy.done, label %arraydestroy.done{{.*}}, label %arraydestroy.body
diff --git a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
index 24b1a4dd42977..af29120feb5bb 100644
--- a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -41,11 +41,11 @@ void glob_delete_A(A *a) { ::delete a; }
// CHECK: icmp eq ptr %[[a]], null
// CHECK: br i1
-// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
// CHECK-ITANIUM-NEXT: call void @_ZdlPvm(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
// CHECK-MSABI64-NEXT: call void @"??3@YAXPEAX_K@Z"(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr noundef nonnull align 4 dereferenceable(4) %[[a]])
+// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) %[[a]])
// CHECK-MSABI32-NEXT: call void @"??3@YAXPAXI@Z"(ptr noundef %[[a]], i32 noundef 4)
struct B {
diff --git a/clang/test/CodeGenCXX/for-range.cpp b/clang/test/CodeGenCXX/for-range.cpp
index 088a34647c374..b9706855f658c 100644
--- a/clang/test/CodeGenCXX/for-range.cpp
+++ b/clang/test/CodeGenCXX/for-range.cpp
@@ -53,7 +53,7 @@ extern B array[5];
// CHECK: for.body:
// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
// CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP2]])
-// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
+// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
// CHECK-NEXT: br label [[FOR_INC:%.*]]
// CHECK: for.inc:
// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -61,7 +61,7 @@ extern B array[5];
// CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
// CHECK-NEXT: br label [[FOR_COND]]
// CHECK: for.end:
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void for_array() {
@@ -81,10 +81,10 @@ void for_array() {
// CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
// CHECK-NEXT: call void @_ZN1CC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
// CHECK-NEXT: store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2:![0-9]+]]
// CHECK-NEXT: [[CALL:%.*]] = call noundef ptr @_Z5beginR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
// CHECK-NEXT: store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
// CHECK-NEXT: [[CALL1:%.*]] = call noundef ptr @_Z3endR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
// CHECK-NEXT: store ptr [[CALL1]], ptr [[__END1]], align 8
// CHECK-NEXT: br label [[FOR_COND:%.*]]
@@ -94,12 +94,12 @@ void for_array() {
// CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
// CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
// CHECK: for.cond.cleanup:
-// CHECK-NEXT: call void @_ZN1CD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1CD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
// CHECK-NEXT: br label [[FOR_END:%.*]]
// CHECK: for.body:
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
// CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
// CHECK-NEXT: br label [[FOR_INC:%.*]]
// CHECK: for.inc:
// CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -107,7 +107,7 @@ void for_array() {
// CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
// CHECK-NEXT: br label [[FOR_COND]]
// CHECK: for.end:
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void for_range() {
@@ -127,10 +127,10 @@ void for_range() {
// CHECK-NEXT: call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
// CHECK-NEXT: call void @_ZN1DC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
// CHECK-NEXT: store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
// CHECK-NEXT: [[CALL:%.*]] = call noundef ptr @_ZN1D5beginEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
// CHECK-NEXT: store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
// CHECK-NEXT: [[CALL1:%.*]] = call noundef ptr @_ZN1D3endEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
// CHECK-NEXT: store ptr [[CALL1]], ptr [[__END1]], align 8
// CHECK-NEXT: br label [[FOR_COND:%.*]]
@@ -140,12 +140,12 @@ void for_range() {
// CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
// CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
// CHECK: for.cond.cleanup:
-// CHECK-NEXT: call void @_ZN1DD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1DD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
// CHECK-NEXT: br label [[FOR_END:%.*]]
// CHECK: for.body:
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
// CHECK-NEXT: call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT: call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
// CHECK-NEXT: br label [[FOR_INC:%.*]]
// CHECK: for.inc:
// CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -153,7 +153,7 @@ void for_range() {
// CHECK-NEXT: store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
// CHECK-NEXT: br label [[FOR_COND]]
// CHECK: for.end:
-// CHECK-NEXT: call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void for_member_range() {
diff --git a/clang/test/CodeGenCXX/gh62818.cpp b/clang/test/CodeGenCXX/gh62818.cpp
index ec91b40fca077..f903679cd6b68 100644
--- a/clang/test/CodeGenCXX/gh62818.cpp
+++ b/clang/test/CodeGenCXX/gh62818.c...
[truncated]
|
|
The actual change is in the first commit (7a3dec4). I've separated the test changes out into 6ec0952 to hopefully make review a bit easier. CC @philnik777 Who came up with the idea and requested this. |
|
This optimization exploits the fact that it's undefined behavior to read from an object after its been destroyed. Given the overall shift in how the industry feels about compilers exploiting undefined behavior, I want to push to add an flag to control this. Think of the people who use I'd also like to better understand why base classes matter for this annotation. Until very recently, |
rnk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, we should do it, this is a valuable optimization. (Commenting twice to push out the inline comments).
clang/lib/CodeGen/CGCall.cpp
Outdated
| .getAsAlign(); | ||
| Attrs.addAlignmentAttr(Alignment); | ||
|
|
||
| if (isa_and_nonnull<CXXDestructorDecl>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code golf: You can CSE the CalleeInfo.getCalleeDecl().getDecl() if you use dyn_cast_or_nonnull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Ended up using dyn_cast_if_present.
clang/lib/CodeGen/CGCall.cpp
Outdated
| CalleeInfo.getCalleeDecl().getDecl())) { | ||
| auto *ClassDecl = dyn_cast<CXXRecordDecl>( | ||
| CalleeInfo.getCalleeDecl().getDecl()->getDeclContext()); | ||
| if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to limit this to only destructors of classes with no bases? Whatever the reason (caution, incremental change, etc), comments here would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Virtual base subobjects aren't dead after a base subobject destructor call, but yeah, I can't think of a reason to limit this because of non-virtual bases alone. And even virtual bases are dead after a complete-object destructor call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, just to be conservative while we get implementation experience. I've added a TODO for myself to remove this after we do more testing.
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?
Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to CC @thurstond Who is more familiar with the internals of msan and would be able to confirm. |
I was mostly concerned about cases like https://godbolt.org/z/fq4fWfqKv, thinking that we might eliminate stores that would otherwise propagate into the base destructors. I didn't realize that the invocation of the more-base destructor just happens as a call at the end of the derived class destructor, so is a non-issue. Besides that, I wouldn't mind being conservative initially and making sure that everything ships internally as smoothly as the initial testing that I've done before making this optimization more broad. It doesn't seem like there would be large fallout, but it doesn't seem like there's a downside to doing things incrementally. |
|
You're correct to be conservative about virtual bases, because they will generally still be live out of the base subobject destructor. I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases. |
|
GCC has a related |
This is my understanding as well.
Yep, shadow memory is opaque to the clang backend - the shadow is just a big blob of memory (~16TB on x86-64), not a bunch of objects. |
Thanks for confirming. I've constructed https://godbolt.org/z/nKPfhPxsx which I believe demonstrates the problem there (we end up destroying Given you said this is in general, it sounds like there are cases where virtual bases are not live out of the base subobject destructor? |
Reusing the existing flag name sounds reasonable enough to me, but in #40040 it seems like people had opinions on the flag naming, with @zygoloid suggesting Regarding #24952, I guess I did not end up hitting that one because of the intentional conservativeness around base classes (I also didn't do a bootstrapping build with LTO). |
I've left this as a TODO, mainly so I can make sure that our internal release process doesn't catch many more issues than how I tested this. Then I plan on doing another round of testing without the check on non-virtual bases and then removing the TODO. We don't need more selective flags than this and I don't see why other users couldn't just be pointed at an opt-out flag. Is this strategy okay or would you like to see the non-virtual base check removed before landing? |
Just sloppy wording, sorry. Virtual bases are always live out of the base subobject destructor. |
I know there's been feedback before from Google folks about having a flag whenever we expand an optimization so that they can more easily check whether the expansion specifically caused a particular regression. |
Ack.
I'm the person at Google currently driving this. I think we should be fine with a single flag and would probably push back against the introduction of a second flag if we agree to postpone the enablement of adding the attribute for classes with a non-zero number of bases. |
|
Is it correct to mark destructors' struct X {
int n;
~X() {
this[n].n = 0;
}
};
void f() {
X xs[] = {42, -1};
}I think that's valid -- you can use a pointer to an array element to reach other elements of the array, even if the pointer is the I think we'd need to add a size to the |
Agree we would be introducing a OOB write on the second destructor invocation if we were to DSE |
|
Yeah, seems like a valid issue to me, although I'm not a language lawyer. I tried going through the C++ standard last night because it really seems like something that could be undefined behavior (using Assuming there's no disagreement, I'll probably start putting up a couple patches to make |
|
I don't have much constructive to say here, since I'm not very familiar with the core language part of the standard, nor an expert in Clang or LLVM, but thanks for taking a stab at this @boomanaiden154! It'd be awesome to see this land. |
This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.
I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.