-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[LoopInfo] Pointer to stack object may not be loop invariant in a coroutine function #149936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…outine function A coroutine function may be split to ramp function and resume function, and they have different stack frames, so a pointer to stack objects may have different addresses depending on where it is used, so it's not a loop invariant.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: None (weiguozhi) ChangesA coroutine function may be split to ramp function and resume function, and they have different stack frames, so a pointer to stack objects may have different addresses depending on where it is used, so it's not a loop invariant. It fixes #149604. Full diff: https://github.com/llvm/llvm-project/pull/149936.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 518a634cdb363..9bf5b78d708ba 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -59,8 +59,16 @@ static cl::opt<bool, true>
//
bool Loop::isLoopInvariant(const Value *V) const {
- if (const Instruction *I = dyn_cast<Instruction>(V))
- return !contains(I);
+ if (const Instruction *I = dyn_cast<Instruction>(V)) {
+ // If V is a pointer to stack object and F is a coroutine function, then V
+ // may not be loop invariant because the ramp function and resume function
+ // have different stack frames.
+ if (isa<AllocaInst>(I) &&
+ I->getParent()->getParent()->isPresplitCoroutine())
+ return false;
+ else
+ return !contains(I);
+ }
return true; // All non-instructions are loop invariant
}
diff --git a/llvm/test/Transforms/LICM/licm-coroutine.ll b/llvm/test/Transforms/LICM/licm-coroutine.ll
new file mode 100644
index 0000000000000..a4765acfb93f8
--- /dev/null
+++ b/llvm/test/Transforms/LICM/licm-coroutine.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=licm -S | FileCheck %s
+
+; %fca.0 and %fca.1 should not be hoisted out of the loop because the ramp
+; function and resume function have different stack frames, so %pointer1 and
+; %pointer2 have different values before and after @llvm.coro.suspend.
+
+define ptr @f(i32 %n) presplitcoroutine {
+; CHECK-LABEL: define ptr @f(
+; CHECK-SAME: i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[POINTER1:%.*]] = alloca ptr, align 8
+; CHECK-NEXT: [[POINTER2:%.*]] = alloca ptr, align 8
+; CHECK-NEXT: [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+; CHECK-NEXT: [[SIZE:%.*]] = call i32 @llvm.coro.size.i32()
+; CHECK-NEXT: [[ALLOC:%.*]] = call ptr @malloc(i32 [[SIZE]])
+; CHECK-NEXT: [[HDL:%.*]] = call noalias ptr @llvm.coro.begin(token [[ID]], ptr [[ALLOC]])
+; CHECK-NEXT: br label %[[LOOP:.*]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[N_VAL:%.*]] = phi i32 [ [[N]], %[[ENTRY]] ], [ [[INC:%.*]], %[[RESUME:.*]] ]
+; CHECK-NEXT: [[INC]] = add nsw i32 [[N_VAL]], 1
+; CHECK-NEXT: call void @print(i32 [[N_VAL]])
+; CHECK-NEXT: [[TMP0:%.*]] = call i8 @llvm.coro.suspend(token none, i1 false)
+; CHECK-NEXT: switch i8 [[TMP0]], label %[[SUSPEND_LOOPEXIT:.*]] [
+; CHECK-NEXT: i8 0, label %[[RESUME]]
+; CHECK-NEXT: i8 1, label %[[CLEANUP:.*]]
+; CHECK-NEXT: ]
+; CHECK: [[RESUME]]:
+; CHECK-NEXT: [[FCA_0:%.*]] = insertvalue [2 x ptr] poison, ptr [[POINTER1]], 0
+; CHECK-NEXT: [[FCA_1:%.*]] = insertvalue [2 x ptr] [[FCA_0]], ptr [[POINTER2]], 1
+; CHECK-NEXT: call void @foo([2 x ptr] [[FCA_1]])
+; CHECK-NEXT: br label %[[LOOP]]
+; CHECK: [[CLEANUP]]:
+; CHECK-NEXT: [[MEM:%.*]] = call ptr @llvm.coro.free(token [[ID]], ptr [[HDL]])
+; CHECK-NEXT: call void @free(ptr [[MEM]])
+; CHECK-NEXT: br label %[[SUSPEND:.*]]
+; CHECK: [[SUSPEND_LOOPEXIT]]:
+; CHECK-NEXT: br label %[[SUSPEND]]
+; CHECK: [[SUSPEND]]:
+; CHECK-NEXT: [[UNUSED:%.*]] = call i1 @llvm.coro.end(ptr [[HDL]], i1 false, token none)
+; CHECK-NEXT: ret ptr [[HDL]]
+;
+entry:
+ %pointer1 = alloca ptr
+ %pointer2 = alloca ptr
+ %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
+ %size = call i32 @llvm.coro.size.i32()
+ %alloc = call ptr @malloc(i32 %size)
+ %hdl = call noalias ptr @llvm.coro.begin(token %id, ptr %alloc)
+ br label %loop
+
+loop:
+ %n.val = phi i32 [ %n, %entry ], [ %inc, %resume ]
+ %inc = add nsw i32 %n.val, 1
+ call void @print(i32 %n.val)
+ %0 = call i8 @llvm.coro.suspend(token none, i1 false)
+ switch i8 %0, label %suspend [i8 0, label %resume
+ i8 1, label %cleanup]
+
+resume:
+ %fca.0 = insertvalue [2 x ptr] poison, ptr %pointer1, 0
+ %fca.1 = insertvalue [2 x ptr] %fca.0, ptr %pointer2, 1
+ call void @foo([2 x ptr] %fca.1)
+ br label %loop
+
+cleanup:
+ %mem = call ptr @llvm.coro.free(token %id, ptr %hdl)
+ call void @free(ptr %mem)
+ br label %suspend
+suspend:
+ %unused = call i1 @llvm.coro.end(ptr %hdl, i1 false, token none)
+ ret ptr %hdl
+}
+
+declare void @free(ptr)
+declare ptr @malloc(i32)
+declare void @print(i32)
+declare void @foo([2 x ptr])
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @rnk @zmodem @ChuanqiXu9 @rkjnsn
My understanding from when we last looked at allocas, #127653, is that an alloca is either coro.outside.frame, in which case it doesn't exist after a suspend, or it's not, in which case the address of the alloca refers to its address in the coroutine frame.
If the address actually does need to change, you can't hack up LICM like this; breaking the rules of SSA form will have unmanageable effects on a bunch of optimizations. You need an intrinsic to mark where, exactly, the computation of the address is supposed to happen.
Eli, you are 100% right, this is not a principled fix. However, I think we need to take it. Ever since #135064 landed (an unrelated AAarch64 ABI lowering optimization), we've had to revert it locally internally because it triggers this coroutine bug, where LICM hoists instructions over the first suspend point, leading to UB later. If coroutines were not already generally available, I would just declare them unsupported and stand on the principle that they shouldn't break SSA, but we've shipped them and they have users. We need to deliver correctness fixes in days, not months. We can carry this workaround patch internally, but that seems like it would be a bad outcome for other users of LLVM/Clang C++ coroutines, even if it means LICM doesn't have to carry this workaround. I will ask @zmodem when he gets back from vacation if he can look into the idea discussed in #149604 , which is remapping allocas in the ramp function to use the heap-allocated frame. |
If you put a comment "FIXME: this is semantically inconsistent; we're tracking a proper fix in issue #XXXXXX", or something, and you're planning to work on a proper fix, I'm okay with adding some temporary hack, I guess. Not exactly happy with it, but I understand the constraints you're working with, and I trust you understand why this isn't a long-term solution. This particular hack is pretty nasty, though: we use isLoopInvariant all over the place, and I'm concerned you'll end up causing a cascade of issues in passes like SCEV. Can we contain this to LICM, specifically? |
FIXME comment is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about the performance regression. If we do want to workaround it, maybe we can restrict it to loops that contain suspension points only?
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A coroutine function may be split to ramp function and resume function, and they have different stack frames, so a pointer to stack objects may have different addresses depending on where it is used, so it's not a loop invariant.
It fixes #149604.