- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[Coroutines] Conditional elide coroutines based on hot/cold information #145831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-coroutines Author: Chuanqi Xu (ChuanqiXu9) ChangesThe rationale is, it is not good to elide always. For example, Assume Task is marked with  But this may be a regression instead of an optimization if  This patch tries to mitigate the problem by introduce static hot/cold information. This can be optimized further more but at least this patch makes things better. Full diff: https://github.com/llvm/llvm-project/pull/145831.diff 3 Files Affected: 
 diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index 9115946d205a4..d4d0c0f0895bb 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -24,6 +24,9 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/Support/BranchProbability.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileSystem.h"
 #include "llvm/Transforms/Utils/CallGraphUpdater.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 
@@ -33,6 +36,49 @@ using namespace llvm;
 
 #define DEBUG_TYPE "coro-annotation-elide"
 
+static cl::opt<float> CoroElideBranchRatio(
+    "coro-elide-branch-ratio", cl::init(0.55), cl::Hidden,
+    cl::desc("Minimum BranchProbability to consider a elide a coroutine."));
+extern cl::opt<unsigned> MinBlockCounterExecution;
+
+static cl::opt<bool>
+    PrintElidedCoroutine("print-elided-coroutine-stats", cl::init(false),
+                         cl::Hidden,
+                         cl::desc("Print stats for elided coroutine"));
+
+static cl::opt<std::string>
+    ElideStatOutput("coro-elide-stat-output", cl::init(""), cl::Hidden,
+                    cl::desc("Output file for -print-elided-coroutine-stats. "
+                             "Defaults to standard error output."));
+
+// The return value is used to indicate the owner of the resources. The users
+// should use the output parameter.
+static std::unique_ptr<llvm::raw_ostream>
+getCoroElidedStatsOStream(llvm::raw_ostream *&OS) {
+  if (!PrintElidedCoroutine) {
+    OS = &llvm::nulls();
+    return nullptr;
+  }
+
+  if (ElideStatOutput.empty()) {
+    OS = &llvm::errs();
+    return nullptr;
+  }
+
+  std::error_code EC;
+  auto ret = std::make_unique<llvm::raw_fd_ostream>(ElideStatOutput, EC,
+                                                    sys::fs::OF_Append);
+
+  if (EC) {
+    llvm::errs() << "llvm cannot open file: " << EC.message() << "\n";
+    OS = &llvm::nulls();
+    return nullptr;
+  }
+
+  OS = ret.get();
+  return ret;
+}
+
 static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
   for (Instruction &I : F->getEntryBlock())
     if (!isa<AllocaInst>(&I))
@@ -145,6 +191,37 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
       bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
       bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
       if (IsCallerPresplitCoroutine && HasAttr) {
+
+        llvm::raw_ostream *OS = nullptr;
+        auto _ = getCoroElidedStatsOStream(OS);
+        assert(OS && "At least we should able to get access to standard error");
+
+        auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*Caller);
+        if (BFI.getBlockFreq(CB->getParent()) <
+            BFI.getEntryFreq()) {
+          static BranchProbability MinBranchProbability(
+              static_cast<int>(CoroElideBranchRatio * MinBlockCounterExecution),
+              MinBlockCounterExecution);
+
+          auto Prob = BranchProbability::getBranchProbability(
+              BFI.getBlockFreq(CB->getParent()).getFrequency(),
+              BFI.getEntryFreq().getFrequency());
+
+          if (Prob < MinBranchProbability) {
+            *OS << "Not eliding " << *CB
+                << " with estimated probability: " << Prob << "\n";
+            continue;
+          }
+
+          *OS << "BB Prob: \t" << Prob << "\n";
+        } else {
+          *OS << "BB Freq: \t"
+              << BFI.getBlockFreq(CB->getParent()).getFrequency() << "\n";
+          *OS << "Entry Freq: \t" << BFI.getEntryFreq().getFrequency() << "\n";
+        }
+
+        *OS << "eliding " << *CB << "\n";
+
         auto *CallerN = CG.lookup(*Caller);
         auto *CallerC = CallerN ? CG.lookupSCC(*CallerN) : nullptr;
         // If CallerC is nullptr, it means LazyCallGraph hasn't visited Caller
diff --git a/llvm/lib/Transforms/IPO/PartialInlining.cpp b/llvm/lib/Transforms/IPO/PartialInlining.cpp
index 2583249e65484..1a00d173d3ae0 100644
--- a/llvm/lib/Transforms/IPO/PartialInlining.cpp
+++ b/llvm/lib/Transforms/IPO/PartialInlining.cpp
@@ -109,7 +109,7 @@ static cl::opt<float> MinRegionSizeRatio(
              "outline candidate and original function"));
 // Used to tune the minimum number of execution counts needed in the predecessor
 // block to the cold edge. ie. confidence interval.
-static cl::opt<unsigned>
+cl::opt<unsigned>
     MinBlockCounterExecution("min-block-execution", cl::init(100), cl::Hidden,
                              cl::desc("Minimum block executions to consider "
                                       "its BranchProbabilityInfo valid"));
diff --git a/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll b/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll
new file mode 100644
index 0000000000000..04c5bf0494278
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll
@@ -0,0 +1,79 @@
+; Testing elide performed its job for calls to coroutines marked safe.
+; RUN: opt < %s -S -passes='cgscc(coro-annotation-elide)' -coro-elide-branch-ratio=0.55 | FileCheck %s
+
+%struct.Task = type { ptr }
+
+declare void @print(i32) nounwind
+
+; resume part of the coroutine
+define fastcc void @callee.resume(ptr dereferenceable(1)) {
+  tail call void @print(i32 0)
+  ret void
+}
+
+; destroy part of the coroutine
+define fastcc void @callee.destroy(ptr) {
+  tail call void @print(i32 1)
+  ret void
+}
+
+; cleanup part of the coroutine
+define fastcc void @callee.cleanup(ptr) {
+  tail call void @print(i32 2)
+  ret void
+}
+
+@callee.resumers = internal constant [3 x ptr] [
+  ptr @callee.resume, ptr @callee.destroy, ptr @callee.cleanup]
+
+declare void @alloc(i1) nounwind
+
+; CHECK-LABEL: define ptr @callee
+define ptr @callee(i8 %arg) {
+entry:
+  %task = alloca %struct.Task, align 8
+  %id = call token @llvm.coro.id(i32 0, ptr null,
+                          ptr @callee,
+                          ptr @callee.resumers)
+  %alloc = call i1 @llvm.coro.alloc(token %id)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  store ptr %hdl, ptr %task
+  ret ptr %task
+}
+
+; CHECK-LABEL: define ptr @callee.noalloc
+define ptr @callee.noalloc(i8 %arg, ptr dereferenceable(32) align(8) %frame) {
+ entry:
+  %task = alloca %struct.Task, align 8
+  %id = call token @llvm.coro.id(i32 0, ptr null,
+                          ptr @callee,
+                          ptr @callee.resumers)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  store ptr %hdl, ptr %task
+  ret ptr %task
+}
+
+; CHECK-LABEL: define ptr @caller(i1 %cond)
+; Function Attrs: presplitcoroutine
+define ptr @caller(i1 %cond) #0 {
+entry:
+  br i1 %cond, label %call, label %ret
+
+call:
+  %task = call ptr @callee(i8 0) #1
+  br label %ret
+
+ret:
+  %retval = phi ptr [ %task, %call ], [ null, %entry ]
+  ret ptr %retval
+  ; CHECK-NOT: alloca
+}
+
+declare token @llvm.coro.id(i32, ptr, ptr, ptr)
+declare ptr @llvm.coro.begin(token, ptr)
+declare ptr @llvm.coro.frame()
+declare ptr @llvm.coro.subfn.addr(ptr, i8)
+declare i1 @llvm.coro.alloc(token)
+
+attributes #0 = { presplitcoroutine }
+attributes #1 = { coro_elide_safe }
 | 
| @llvm/pr-subscribers-llvm-transforms Author: Chuanqi Xu (ChuanqiXu9) ChangesThe rationale is, it is not good to elide always. For example, Assume Task is marked with  But this may be a regression instead of an optimization if  This patch tries to mitigate the problem by introduce static hot/cold information. This can be optimized further more but at least this patch makes things better. Full diff: https://github.com/llvm/llvm-project/pull/145831.diff 3 Files Affected: 
 diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index 9115946d205a4..d4d0c0f0895bb 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -24,6 +24,9 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/Support/BranchProbability.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileSystem.h"
 #include "llvm/Transforms/Utils/CallGraphUpdater.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 
@@ -33,6 +36,49 @@ using namespace llvm;
 
 #define DEBUG_TYPE "coro-annotation-elide"
 
+static cl::opt<float> CoroElideBranchRatio(
+    "coro-elide-branch-ratio", cl::init(0.55), cl::Hidden,
+    cl::desc("Minimum BranchProbability to consider a elide a coroutine."));
+extern cl::opt<unsigned> MinBlockCounterExecution;
+
+static cl::opt<bool>
+    PrintElidedCoroutine("print-elided-coroutine-stats", cl::init(false),
+                         cl::Hidden,
+                         cl::desc("Print stats for elided coroutine"));
+
+static cl::opt<std::string>
+    ElideStatOutput("coro-elide-stat-output", cl::init(""), cl::Hidden,
+                    cl::desc("Output file for -print-elided-coroutine-stats. "
+                             "Defaults to standard error output."));
+
+// The return value is used to indicate the owner of the resources. The users
+// should use the output parameter.
+static std::unique_ptr<llvm::raw_ostream>
+getCoroElidedStatsOStream(llvm::raw_ostream *&OS) {
+  if (!PrintElidedCoroutine) {
+    OS = &llvm::nulls();
+    return nullptr;
+  }
+
+  if (ElideStatOutput.empty()) {
+    OS = &llvm::errs();
+    return nullptr;
+  }
+
+  std::error_code EC;
+  auto ret = std::make_unique<llvm::raw_fd_ostream>(ElideStatOutput, EC,
+                                                    sys::fs::OF_Append);
+
+  if (EC) {
+    llvm::errs() << "llvm cannot open file: " << EC.message() << "\n";
+    OS = &llvm::nulls();
+    return nullptr;
+  }
+
+  OS = ret.get();
+  return ret;
+}
+
 static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
   for (Instruction &I : F->getEntryBlock())
     if (!isa<AllocaInst>(&I))
@@ -145,6 +191,37 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
       bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
       bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
       if (IsCallerPresplitCoroutine && HasAttr) {
+
+        llvm::raw_ostream *OS = nullptr;
+        auto _ = getCoroElidedStatsOStream(OS);
+        assert(OS && "At least we should able to get access to standard error");
+
+        auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*Caller);
+        if (BFI.getBlockFreq(CB->getParent()) <
+            BFI.getEntryFreq()) {
+          static BranchProbability MinBranchProbability(
+              static_cast<int>(CoroElideBranchRatio * MinBlockCounterExecution),
+              MinBlockCounterExecution);
+
+          auto Prob = BranchProbability::getBranchProbability(
+              BFI.getBlockFreq(CB->getParent()).getFrequency(),
+              BFI.getEntryFreq().getFrequency());
+
+          if (Prob < MinBranchProbability) {
+            *OS << "Not eliding " << *CB
+                << " with estimated probability: " << Prob << "\n";
+            continue;
+          }
+
+          *OS << "BB Prob: \t" << Prob << "\n";
+        } else {
+          *OS << "BB Freq: \t"
+              << BFI.getBlockFreq(CB->getParent()).getFrequency() << "\n";
+          *OS << "Entry Freq: \t" << BFI.getEntryFreq().getFrequency() << "\n";
+        }
+
+        *OS << "eliding " << *CB << "\n";
+
         auto *CallerN = CG.lookup(*Caller);
         auto *CallerC = CallerN ? CG.lookupSCC(*CallerN) : nullptr;
         // If CallerC is nullptr, it means LazyCallGraph hasn't visited Caller
diff --git a/llvm/lib/Transforms/IPO/PartialInlining.cpp b/llvm/lib/Transforms/IPO/PartialInlining.cpp
index 2583249e65484..1a00d173d3ae0 100644
--- a/llvm/lib/Transforms/IPO/PartialInlining.cpp
+++ b/llvm/lib/Transforms/IPO/PartialInlining.cpp
@@ -109,7 +109,7 @@ static cl::opt<float> MinRegionSizeRatio(
              "outline candidate and original function"));
 // Used to tune the minimum number of execution counts needed in the predecessor
 // block to the cold edge. ie. confidence interval.
-static cl::opt<unsigned>
+cl::opt<unsigned>
     MinBlockCounterExecution("min-block-execution", cl::init(100), cl::Hidden,
                              cl::desc("Minimum block executions to consider "
                                       "its BranchProbabilityInfo valid"));
diff --git a/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll b/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll
new file mode 100644
index 0000000000000..04c5bf0494278
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/coro-conditional-elide.ll
@@ -0,0 +1,79 @@
+; Testing elide performed its job for calls to coroutines marked safe.
+; RUN: opt < %s -S -passes='cgscc(coro-annotation-elide)' -coro-elide-branch-ratio=0.55 | FileCheck %s
+
+%struct.Task = type { ptr }
+
+declare void @print(i32) nounwind
+
+; resume part of the coroutine
+define fastcc void @callee.resume(ptr dereferenceable(1)) {
+  tail call void @print(i32 0)
+  ret void
+}
+
+; destroy part of the coroutine
+define fastcc void @callee.destroy(ptr) {
+  tail call void @print(i32 1)
+  ret void
+}
+
+; cleanup part of the coroutine
+define fastcc void @callee.cleanup(ptr) {
+  tail call void @print(i32 2)
+  ret void
+}
+
+@callee.resumers = internal constant [3 x ptr] [
+  ptr @callee.resume, ptr @callee.destroy, ptr @callee.cleanup]
+
+declare void @alloc(i1) nounwind
+
+; CHECK-LABEL: define ptr @callee
+define ptr @callee(i8 %arg) {
+entry:
+  %task = alloca %struct.Task, align 8
+  %id = call token @llvm.coro.id(i32 0, ptr null,
+                          ptr @callee,
+                          ptr @callee.resumers)
+  %alloc = call i1 @llvm.coro.alloc(token %id)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  store ptr %hdl, ptr %task
+  ret ptr %task
+}
+
+; CHECK-LABEL: define ptr @callee.noalloc
+define ptr @callee.noalloc(i8 %arg, ptr dereferenceable(32) align(8) %frame) {
+ entry:
+  %task = alloca %struct.Task, align 8
+  %id = call token @llvm.coro.id(i32 0, ptr null,
+                          ptr @callee,
+                          ptr @callee.resumers)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  store ptr %hdl, ptr %task
+  ret ptr %task
+}
+
+; CHECK-LABEL: define ptr @caller(i1 %cond)
+; Function Attrs: presplitcoroutine
+define ptr @caller(i1 %cond) #0 {
+entry:
+  br i1 %cond, label %call, label %ret
+
+call:
+  %task = call ptr @callee(i8 0) #1
+  br label %ret
+
+ret:
+  %retval = phi ptr [ %task, %call ], [ null, %entry ]
+  ret ptr %retval
+  ; CHECK-NOT: alloca
+}
+
+declare token @llvm.coro.id(i32, ptr, ptr, ptr)
+declare ptr @llvm.coro.begin(token, ptr)
+declare ptr @llvm.coro.frame()
+declare ptr @llvm.coro.subfn.addr(ptr, i8)
+declare i1 @llvm.coro.alloc(token)
+
+attributes #0 = { presplitcoroutine }
+attributes #1 = { coro_elide_safe }
 | 
| You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp llvm/lib/Transforms/IPO/PartialInlining.cppView the diff from clang-format here.diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index d4d0c0f08..3e5343033 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -197,8 +197,7 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
         assert(OS && "At least we should able to get access to standard error");
 
         auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*Caller);
-        if (BFI.getBlockFreq(CB->getParent()) <
-            BFI.getEntryFreq()) {
+        if (BFI.getBlockFreq(CB->getParent()) < BFI.getEntryFreq()) {
           static BranchProbability MinBranchProbability(
               static_cast<int>(CoroElideBranchRatio * MinBlockCounterExecution),
               MinBlockCounterExecution);
 | 
| if (Prob < MinBranchProbability) { | ||
| *OS << "Not eliding " << *CB | ||
| << " with estimated probability: " << Prob << "\n"; | ||
| continue; | ||
| } | ||
|  | ||
| *OS << "BB Prob: \t" << Prob << "\n"; | ||
| } else { | ||
| *OS << "BB Freq: \t" | ||
| << BFI.getBlockFreq(CB->getParent()).getFrequency() << "\n"; | ||
| *OS << "Entry Freq: \t" << BFI.getEntryFreq().getFrequency() << "\n"; | ||
| } | ||
|  | ||
| *OS << "eliding " << *CB << "\n"; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we integrate this information on whether / why a coroutine was elided into the optimization remarks (already used further down in this file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds not bad. But I prefer to leave it to other PRs if wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your PR might actually become simpler if you integrate with optimization remarks right away. You might be able to remove the debug settings and the custom logic to determine the output stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. But I am pretty busy right now, (the PR is extracted from downstream project directly), so I want to ask your opinion for landing this:
- Blocking the current PR until I had sometime to look at this.
- Land the PR as is, and
 a. Others can fix the problem if the want. e.g., if you have time, you can make this
 b. Add a FIXME then we can look back at this someday.
Generally, Option 1 is the general option in LLVM community. But given for coroutines, the size of the group of people is much smaller, I guess we can be more flexible? Option 2.a may be best if you'd like to contribute on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should definitely still get this patch in before the 21.0 branch cut
But I am pretty busy right now
Will this become better in the foreseeable future? Would you have time to revisit this before the branch cut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ORE variable is just 10 lines away from the hunk of your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this become better in the foreseeable future? Would you have time to revisit this before the branch cut?
Maybe, but I can't promise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ORE variable is just 10 lines away from the hunk of your change.
Yeah, but if anything surprising happens, it may take me 30+ minutes to, maybe, one hour. I just don't want to have a risk for me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should definitely still get this patch in before the 21.0 branch cut
I think if you want that, may be you can take it after I land this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased this patch and integrated it with the optimization remarks, see #162276.
Would appreciate your reviews @ChuanqiXu9 and @yuxuanchen1997
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea sounds appealing. Would prefer the stats to be integrated with existing optimization remarks too.
…on (#162276) Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will always elide the call to `bar()` into the frame of `foo()`. But this may be a regression instead of an optimization if `b` is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9 (#145831), but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targeted, etc.) Co-Authored-by: Chuanqi Xu <[email protected]>
…on (#162276) Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will always elide the call to `bar()` into the frame of `foo()`. But this may be a regression instead of an optimization if `b` is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9 (#145831), but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targeted, etc.) Co-Authored-by: Chuanqi Xu <[email protected]>
…on (llvm#162276) Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will always elide the call to `bar()` into the frame of `foo()`. But this may be a regression instead of an optimization if `b` is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9 (llvm#145831), but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targeted, etc.) Co-Authored-by: Chuanqi Xu <[email protected]>
| Closing, as the changes were applied as part of #162276 | 
…on (llvm#162276) Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines is not good. For example, ``` Task bar(); Task foo(bool b) { if (b) [[unlikely]] { co_await bar(); } } ``` Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will always elide the call to `bar()` into the frame of `foo()`. But this may be a regression instead of an optimization if `b` is always false. This patch tries to mitigate the problem by leveraging hot/cold information. This can be optimized further in the future but at least this patch makes things better. This patch was originally written by ChuanqiXu9 (llvm#145831), but stalled during PR review because the diagnostics were not integrated with the existing optimization remarks. I rebased the original patch, integrated it with the optimization remarks, and did a couple of smaller cosmetic changes (e.g., made the test case expectations more targeted, etc.) Co-Authored-by: Chuanqi Xu <[email protected]>
The rationale is, always eliding is not good.
For example,
Assume Task is marked with
[[clang::coro_await_elidable]], now we will always elide the call tobar()into the frame offoo()).But this may be a regression instead of an optimization if
bis always false.This patch tries to mitigate the problem by introduce static hot/cold information. This can be optimized further more but at least this patch makes things better.