Skip to content

[LoopInfo] Pointer to stack object may not be loop invariant in a coroutine function #149936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 9, 2025

Conversation

weiguozhi
Copy link
Contributor

A coroutine function may be split to ramp function and resume function, and they have different stack frames, so a pointer to stack objects may have different addresses depending on where it is used, so it's not a loop invariant.

It fixes #149604.

…outine function

A coroutine function may be split to ramp function and resume function,
and they have different stack frames, so a pointer to stack objects may
have different addresses depending on where it is used, so it's not a
loop invariant.
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: None (weiguozhi)

Changes

A coroutine function may be split to ramp function and resume function, and they have different stack frames, so a pointer to stack objects may have different addresses depending on where it is used, so it's not a loop invariant.

It fixes #149604.


Full diff: https://github.com/llvm/llvm-project/pull/149936.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/LoopInfo.cpp (+10-2)
  • (added) llvm/test/Transforms/LICM/licm-coroutine.ll (+78)
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 518a634cdb363..9bf5b78d708ba 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -59,8 +59,16 @@ static cl::opt<bool, true>
 //
 
 bool Loop::isLoopInvariant(const Value *V) const {
-  if (const Instruction *I = dyn_cast<Instruction>(V))
-    return !contains(I);
+  if (const Instruction *I = dyn_cast<Instruction>(V)) {
+    // If V is a pointer to stack object and F is a coroutine function, then V
+    // may not be loop invariant because the ramp function and resume function
+    // have different stack frames.
+    if (isa<AllocaInst>(I) &&
+        I->getParent()->getParent()->isPresplitCoroutine())
+      return false;
+    else
+      return !contains(I);
+  }
   return true; // All non-instructions are loop invariant
 }
 
diff --git a/llvm/test/Transforms/LICM/licm-coroutine.ll b/llvm/test/Transforms/LICM/licm-coroutine.ll
new file mode 100644
index 0000000000000..a4765acfb93f8
--- /dev/null
+++ b/llvm/test/Transforms/LICM/licm-coroutine.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=licm -S | FileCheck %s
+
+; %fca.0 and %fca.1 should not be hoisted out of the loop because the ramp
+; function and resume function have different stack frames, so %pointer1 and
+; %pointer2 have different values before and after @llvm.coro.suspend.
+
+define ptr @f(i32 %n) presplitcoroutine {
+; CHECK-LABEL: define ptr @f(
+; CHECK-SAME: i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[POINTER1:%.*]] = alloca ptr, align 8
+; CHECK-NEXT:    [[POINTER2:%.*]] = alloca ptr, align 8
+; CHECK-NEXT:    [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+; CHECK-NEXT:    [[SIZE:%.*]] = call i32 @llvm.coro.size.i32()
+; CHECK-NEXT:    [[ALLOC:%.*]] = call ptr @malloc(i32 [[SIZE]])
+; CHECK-NEXT:    [[HDL:%.*]] = call noalias ptr @llvm.coro.begin(token [[ID]], ptr [[ALLOC]])
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[N_VAL:%.*]] = phi i32 [ [[N]], %[[ENTRY]] ], [ [[INC:%.*]], %[[RESUME:.*]] ]
+; CHECK-NEXT:    [[INC]] = add nsw i32 [[N_VAL]], 1
+; CHECK-NEXT:    call void @print(i32 [[N_VAL]])
+; CHECK-NEXT:    [[TMP0:%.*]] = call i8 @llvm.coro.suspend(token none, i1 false)
+; CHECK-NEXT:    switch i8 [[TMP0]], label %[[SUSPEND_LOOPEXIT:.*]] [
+; CHECK-NEXT:      i8 0, label %[[RESUME]]
+; CHECK-NEXT:      i8 1, label %[[CLEANUP:.*]]
+; CHECK-NEXT:    ]
+; CHECK:       [[RESUME]]:
+; CHECK-NEXT:    [[FCA_0:%.*]] = insertvalue [2 x ptr] poison, ptr [[POINTER1]], 0
+; CHECK-NEXT:    [[FCA_1:%.*]] = insertvalue [2 x ptr] [[FCA_0]], ptr [[POINTER2]], 1
+; CHECK-NEXT:    call void @foo([2 x ptr] [[FCA_1]])
+; CHECK-NEXT:    br label %[[LOOP]]
+; CHECK:       [[CLEANUP]]:
+; CHECK-NEXT:    [[MEM:%.*]] = call ptr @llvm.coro.free(token [[ID]], ptr [[HDL]])
+; CHECK-NEXT:    call void @free(ptr [[MEM]])
+; CHECK-NEXT:    br label %[[SUSPEND:.*]]
+; CHECK:       [[SUSPEND_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[SUSPEND]]
+; CHECK:       [[SUSPEND]]:
+; CHECK-NEXT:    [[UNUSED:%.*]] = call i1 @llvm.coro.end(ptr [[HDL]], i1 false, token none)
+; CHECK-NEXT:    ret ptr [[HDL]]
+;
+entry:
+  %pointer1 = alloca ptr
+  %pointer2 = alloca ptr
+  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+  %size = call i32 @llvm.coro.size.i32()
+  %alloc = call ptr @malloc(i32 %size)
+  %hdl = call noalias ptr @llvm.coro.begin(token %id, ptr %alloc)
+  br label %loop
+
+loop:
+  %n.val = phi i32 [ %n, %entry ], [ %inc, %resume ]
+  %inc = add nsw i32 %n.val, 1
+  call void @print(i32 %n.val)
+  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+  switch i8 %0, label %suspend [i8 0, label %resume
+  i8 1, label %cleanup]
+
+resume:
+  %fca.0 = insertvalue [2 x ptr] poison, ptr %pointer1, 0
+  %fca.1 = insertvalue [2 x ptr] %fca.0, ptr %pointer2, 1
+  call void @foo([2 x ptr] %fca.1)
+  br label %loop
+
+cleanup:
+  %mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
+  call void @free(ptr %mem)
+  br label %suspend
+suspend:
+  %unused = call i1 @llvm.coro.end(ptr %hdl, i1 false, token none)
+  ret ptr %hdl
+}
+
+declare void @free(ptr)
+declare ptr @malloc(i32)
+declare void @print(i32)
+declare void @foo([2 x ptr])

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @rnk @zmodem @ChuanqiXu9 @rkjnsn

My understanding from when we last looked at allocas, #127653, is that an alloca is either coro.outside.frame, in which case it doesn't exist after a suspend, or it's not, in which case the address of the alloca refers to its address in the coroutine frame.

If the address actually does need to change, you can't hack up LICM like this; breaking the rules of SSA form will have unmanageable effects on a bunch of optimizations. You need an intrinsic to mark where, exactly, the computation of the address is supposed to happen.

@rnk
Copy link
Collaborator

rnk commented Jul 28, 2025

Eli, you are 100% right, this is not a principled fix. However, I think we need to take it.

Ever since #135064 landed (an unrelated AAarch64 ABI lowering optimization), we've had to revert it locally internally because it triggers this coroutine bug, where LICM hoists instructions over the first suspend point, leading to UB later. If coroutines were not already generally available, I would just declare them unsupported and stand on the principle that they shouldn't break SSA, but we've shipped them and they have users. We need to deliver correctness fixes in days, not months.

We can carry this workaround patch internally, but that seems like it would be a bad outcome for other users of LLVM/Clang C++ coroutines, even if it means LICM doesn't have to carry this workaround.

I will ask @zmodem when he gets back from vacation if he can look into the idea discussed in #149604 , which is remapping allocas in the ramp function to use the heap-allocated frame.

@efriedma-quic
Copy link
Collaborator

If you put a comment "FIXME: this is semantically inconsistent; we're tracking a proper fix in issue #XXXXXX", or something, and you're planning to work on a proper fix, I'm okay with adding some temporary hack, I guess. Not exactly happy with it, but I understand the constraints you're working with, and I trust you understand why this isn't a long-term solution.

This particular hack is pretty nasty, though: we use isLoopInvariant all over the place, and I'm concerned you'll end up causing a cascade of issues in passes like SCEV. Can we contain this to LICM, specifically?

@weiguozhi
Copy link
Contributor Author

FIXME comment is added.

Copy link
Contributor

@NewSigma NewSigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the performance regression. If we do want to workaround it, maybe we can restrict it to loops that contain suspension points only?

@efriedma-quic
Copy link
Collaborator

This particular hack is pretty nasty, though: we use isLoopInvariant all over the place, and I'm concerned you'll end up causing a cascade of issues in passes like SCEV. Can we contain this to LICM, specifically?

Copy link

github-actions bot commented Aug 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@weiguozhi weiguozhi merged commit 5e87792 into llvm:main Aug 9, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong LICM of expression involving pointer to alloca object in coroutine function
5 participants