Skip to content

Conversation

@vogelsgesang
Copy link
Member

@vogelsgesang vogelsgesang commented Oct 7, 2025

Unconditionally eliding all [[clang::coro_await_elidable]] coroutines
is not good.

For example,

Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}

Assume Task is marked with [[clang::coro_await_elidable]], now we will
always elide the call to bar() into the frame of foo(). But this may be
a regression instead of an optimization if b is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9 (#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch, integrated
it with the optimization remarks, and did a couple of smaller cosmetic
changes (e.g., made the test case expectations more targetted, etc.)

Co-Authored-by: Chuanqi Xu [email protected]

vogelsgesang and others added 3 commits October 7, 2025 12:00
Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines
is not good.

For example,

```
Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}
```

Assume Task is marked with [[clang::coro_await_elidable]], now we will
always elide the call to bar() into the frame of foo(). But this may be
a regression instead of an optimization if b is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9, but stalled during PR
review because the diagnostics were not integrated with the existing
optimization remarks. I rebased the original patch, integrated it with
the optimization remarks, and did a couple of smaller cosmetic changes
(e.g., made the test case expectations more targetted, etc.)

Co-Authored-by: Chuanqi Xu <[email protected]>
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@vogelsgesang vogelsgesang requested a review from NewSigma October 7, 2025 12:36
Copy link
Member

@yuxuanchen1997 yuxuanchen1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Contributor

@NewSigma NewSigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor nit

bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
if (IsCallerPresplitCoroutine && HasAttr) {
static BranchProbability MinBranchProbability(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a static variable here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to make it a global variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the static and kept it as a local variable.

I don't think we can make this a global variable, as we would run into initialization order issues between CoroElideBranchRatio, MinBlockCounterExecution (which are already global static variables today) and the new global MinBranchProbability

bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
if (IsCallerPresplitCoroutine && HasAttr) {
static BranchProbability MinBranchProbability(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to make it a global variable?

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Adrian Vogelsgesang (vogelsgesang)

Changes

Unconditionally eliding all [[clang::coro_await_elidable]] coroutines
is not good.

For example,

Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}

Assume Task is marked with [[clang::coro_await_elidable]], now we will
always elide the call to bar() into the frame of foo(). But this may be
a regression instead of an optimization if b is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9 (#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch, integrated
it with the optimization remarks, and did a couple of smaller cosmetic
changes (e.g., made the test case expectations more targetted, etc.)

Co-Authored-by: Chuanqi Xu <[email protected]>


Full diff: https://github.com/llvm/llvm-project/pull/162276.diff

5 Files Affected:

  • (modified) llvm/include/llvm/IR/DiagnosticInfo.h (+2)
  • (modified) llvm/lib/IR/DiagnosticInfo.cpp (+7)
  • (modified) llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp (+33-1)
  • (modified) llvm/lib/Transforms/IPO/PartialInlining.cpp (+1-1)
  • (renamed) llvm/test/Transforms/Coroutines/coro-elide-safe.ll (+25-3)
diff --git a/llvm/include/llvm/IR/DiagnosticInfo.h b/llvm/include/llvm/IR/DiagnosticInfo.h
index 5f7225e878eb1..a426fb079ec04 100644
--- a/llvm/include/llvm/IR/DiagnosticInfo.h
+++ b/llvm/include/llvm/IR/DiagnosticInfo.h
@@ -20,6 +20,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/IR/DebugLoc.h"
+#include "llvm/Support/BranchProbability.h"
 #include "llvm/Support/CBindingWrapping.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -555,6 +556,7 @@ class LLVM_ABI DiagnosticInfoOptimizationBase
     Argument(StringRef Key, bool B) : Key(Key), Val(B ? "true" : "false") {}
     LLVM_ABI Argument(StringRef Key, DebugLoc dl);
     LLVM_ABI Argument(StringRef Key, InstructionCost C);
+    LLVM_ABI Argument(StringRef Key, BranchProbability P);
   };
 
   /// \p PassName is the name of the pass emitting this diagnostic. \p
diff --git a/llvm/lib/IR/DiagnosticInfo.cpp b/llvm/lib/IR/DiagnosticInfo.cpp
index 4f3762427e012..8e6d654f6afb3 100644
--- a/llvm/lib/IR/DiagnosticInfo.cpp
+++ b/llvm/lib/IR/DiagnosticInfo.cpp
@@ -273,6 +273,13 @@ DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key,
   C.print(OS);
 }
 
+DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key,
+                                                   BranchProbability P)
+    : Key(std::string(Key)) {
+  raw_string_ostream OS(Val);
+  P.print(OS);
+}
+
 DiagnosticInfoOptimizationBase::Argument::Argument(StringRef Key, DebugLoc Loc)
     : Key(std::string(Key)), Loc(Loc) {
   if (Loc) {
diff --git a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
index 9115946d205a4..f166fef74941e 100644
--- a/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
@@ -24,6 +24,9 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/Support/BranchProbability.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileSystem.h"
 #include "llvm/Transforms/Utils/CallGraphUpdater.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 
@@ -33,6 +36,11 @@ using namespace llvm;
 
 #define DEBUG_TYPE "coro-annotation-elide"
 
+static cl::opt<float> CoroElideBranchRatio(
+    "coro-elide-branch-ratio", cl::init(0.55), cl::Hidden,
+    cl::desc("Minimum BranchProbability to consider a elide a coroutine."));
+extern cl::opt<unsigned> MinBlockCounterExecution;
+
 static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
   for (Instruction &I : F->getEntryBlock())
     if (!isa<AllocaInst>(&I))
@@ -145,6 +153,30 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
       bool IsCallerPresplitCoroutine = Caller->isPresplitCoroutine();
       bool HasAttr = CB->hasFnAttr(llvm::Attribute::CoroElideSafe);
       if (IsCallerPresplitCoroutine && HasAttr) {
+        BranchProbability MinBranchProbability(
+            static_cast<int>(CoroElideBranchRatio * MinBlockCounterExecution),
+            MinBlockCounterExecution);
+
+        auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*Caller);
+
+        auto Prob = BranchProbability::getBranchProbability(
+            BFI.getBlockFreq(CB->getParent()).getFrequency(),
+            BFI.getEntryFreq().getFrequency());
+
+        if (Prob < MinBranchProbability) {
+          ORE.emit([&]() {
+            return OptimizationRemarkMissed(
+                       DEBUG_TYPE, "CoroAnnotationElideUnlikely", Caller)
+                   << "'" << ore::NV("callee", Callee->getName())
+                   << "' not elided in '"
+                   << ore::NV("caller", Caller->getName())
+                   << "' because of low probability: "
+                   << ore::NV("probability", Prob) << " (threshold: "
+                   << ore::NV("threshold", MinBranchProbability) << ")";
+          });
+          continue;
+        }
+
         auto *CallerN = CG.lookup(*Caller);
         auto *CallerC = CallerN ? CG.lookupSCC(*CallerN) : nullptr;
         // If CallerC is nullptr, it means LazyCallGraph hasn't visited Caller
@@ -156,7 +188,7 @@ PreservedAnalyses CoroAnnotationElidePass::run(LazyCallGraph::SCC &C,
           return OptimizationRemark(DEBUG_TYPE, "CoroAnnotationElide", Caller)
                  << "'" << ore::NV("callee", Callee->getName())
                  << "' elided in '" << ore::NV("caller", Caller->getName())
-                 << "'";
+                 << "' (probability: " << ore::NV("probability", Prob) << ")";
         });
 
         FAM.invalidate(*Caller, PreservedAnalyses::none());
diff --git a/llvm/lib/Transforms/IPO/PartialInlining.cpp b/llvm/lib/Transforms/IPO/PartialInlining.cpp
index 2583249e65484..1a00d173d3ae0 100644
--- a/llvm/lib/Transforms/IPO/PartialInlining.cpp
+++ b/llvm/lib/Transforms/IPO/PartialInlining.cpp
@@ -109,7 +109,7 @@ static cl::opt<float> MinRegionSizeRatio(
              "outline candidate and original function"));
 // Used to tune the minimum number of execution counts needed in the predecessor
 // block to the cold edge. ie. confidence interval.
-static cl::opt<unsigned>
+cl::opt<unsigned>
     MinBlockCounterExecution("min-block-execution", cl::init(100), cl::Hidden,
                              cl::desc("Minimum block executions to consider "
                                       "its BranchProbabilityInfo valid"));
diff --git a/llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll b/llvm/test/Transforms/Coroutines/coro-elide-safe.ll
similarity index 74%
rename from llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll
rename to llvm/test/Transforms/Coroutines/coro-elide-safe.ll
index 4eec7edad8b0f..722693d4caafd 100644
--- a/llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll
+++ b/llvm/test/Transforms/Coroutines/coro-elide-safe.ll
@@ -1,4 +1,8 @@
-; Testing elide performed its job for calls to coroutines marked safe.
+; Coroutine calls marked with `coro_elide_safe` should be elided.
+; Inside `caller`, we expect the `callee` coroutine to be elided.
+; Inside `caller_conditional`, `callee` is only called on an unlikely
+; path, hence we expect the `callee` coroutine NOT to be elided.
+;
 ; RUN: opt < %s -S -passes='cgscc(coro-annotation-elide)' | FileCheck %s
 
 %struct.Task = type { ptr }
@@ -57,7 +61,7 @@ define ptr @callee.noalloc(i8 %arg, ptr dereferenceable(32) align(8) %frame) {
 ; Function Attrs: presplitcoroutine
 define ptr @caller() #0 {
 entry:
-  %task = call ptr @callee(i8 0) #1
+  %task = call ptr @callee(i8 0) coro_elide_safe
   ret ptr %task
   ; CHECK: %[[TASK:.+]] = alloca %struct.Task, align 8
   ; CHECK-NEXT: %[[FRAME:.+]] = alloca [32 x i8], align 8
@@ -69,6 +73,25 @@ entry:
   ; CHECK-NEXT: ret ptr %[[TASK]]
 }
 
+; CHECK-LABEL: define ptr @caller_conditional(i1 %cond)
+; Function Attrs: presplitcoroutine
+define ptr @caller_conditional(i1 %cond) #0 {
+entry:
+  br i1 %cond, label %call, label %ret
+
+call:
+  ; CHECK-NOT: alloca
+  ; CHECK-NOT: @llvm.coro.id({{.*}}, ptr @callee, {{.*}})
+  ; CHECK: %task = call ptr @callee(i8 0)
+  ; CHECK-NEXT: br label %ret
+  %task = call ptr @callee(i8 0) coro_elide_safe
+  br label %ret
+
+ret:
+  %retval = phi ptr [ %task, %call ], [ null, %entry ]
+  ret ptr %retval
+}
+
 declare token @llvm.coro.id(i32, ptr, ptr, ptr)
 declare ptr @llvm.coro.begin(token, ptr)
 declare ptr @llvm.coro.frame()
@@ -76,4 +99,3 @@ declare ptr @llvm.coro.subfn.addr(ptr, i8)
 declare i1 @llvm.coro.alloc(token)
 
 attributes #0 = { presplitcoroutine }
-attributes #1 = { coro_elide_safe }

@vogelsgesang vogelsgesang enabled auto-merge (squash) October 9, 2025 15:30
@vogelsgesang vogelsgesang merged commit c4f3675 into llvm:main Oct 9, 2025
10 of 11 checks passed
@vogelsgesang vogelsgesang deleted the avogelsgesang-coro-conditional-elide branch October 9, 2025 15:32
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
…on (#162276)

Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines
is not good.

For example,

```
Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}
```

Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will
always elide the call to `bar()` into the frame of `foo()`. But this may
be a regression instead of an optimization if `b` is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9 (#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch,
integrated it with the optimization remarks, and did a couple of smaller
cosmetic changes (e.g., made the test case expectations more
targeted, etc.)

Co-Authored-by: Chuanqi Xu <[email protected]>
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
…on (llvm#162276)

Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines
is not good.

For example,

```
Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}
```

Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will
always elide the call to `bar()` into the frame of `foo()`. But this may
be a regression instead of an optimization if `b` is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9 (llvm#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch,
integrated it with the optimization remarks, and did a couple of smaller
cosmetic changes (e.g., made the test case expectations more
targeted, etc.)

Co-Authored-by: Chuanqi Xu <[email protected]>
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
…on (llvm#162276)

Unconditionally eliding all `[[clang::coro_await_elidable]]` coroutines
is not good.

For example,

```
Task bar();
Task foo(bool b) {
     if (b) [[unlikely]] {
         co_await bar();
     }
}
```

Assume Task is marked with `[[clang::coro_await_elidable]]`, now we will
always elide the call to `bar()` into the frame of `foo()`. But this may
be a regression instead of an optimization if `b` is always false.

This patch tries to mitigate the problem by leveraging hot/cold
information. This can be optimized further in the future but at least
this patch makes things better.

This patch was originally written by ChuanqiXu9 (llvm#145831), but stalled
during PR review because the diagnostics were not integrated with
the existing optimization remarks. I rebased the original patch,
integrated it with the optimization remarks, and did a couple of smaller
cosmetic changes (e.g., made the test case expectations more
targeted, etc.)

Co-Authored-by: Chuanqi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants