Skip to content

Commit b70e6fc

Browse files
committed
[AMDGPU] Fix module split's assumption on kernels
Module split assumes that a kernel function must have an external linkage; however, that isn't the case. For example, a static kernel function will have a weak_odr linkage Change-Id: I1e5dee0de1fd866b365f4090a574e1b2961f8dca
1 parent 7b7ae72 commit b70e6fc

File tree

2 files changed

+8
-9
lines changed

2 files changed

+8
-9
lines changed

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,13 +157,12 @@ static auto formatRatioOf(CostType Num, CostType Dem) {
157157
/// Non-copyable functions cannot be cloned into multiple partitions, and only
158158
/// one copy of the function can be present across all partitions.
159159
///
160-
/// External functions fall into this category. If we were to clone them, we
161-
/// would end up with multiple symbol definitions and a very unhappy linker.
160+
/// Kernel functions and external functions fall into this category. If we were
161+
/// to clone them, we would end up with multiple symbol definitions and a very
162+
/// unhappy linker.
162163
static bool isNonCopyable(const Function &F) {
163-
assert(AMDGPU::isEntryFunctionCC(F.getCallingConv())
164-
? F.hasExternalLinkage()
165-
: true && "Kernel w/o external linkage?");
166-
return F.hasExternalLinkage() || !F.isDefinitionExact();
164+
return AMDGPU::isEntryFunctionCC(F.getCallingConv()) ||
165+
F.hasExternalLinkage() || !F.isDefinitionExact();
167166
}
168167

169168
/// If \p GV has local linkage, make it external + hidden.

llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
; CHECK0: declare
2020

2121
; CHECK1: define internal void @HelperC()
22-
; CHECK1: define amdgpu_kernel void @C
22+
; CHECK1: define weak_odr amdgpu_kernel void @C
2323

2424
; CHECK2: define internal void @large2()
2525
; CHECK2: define internal void @large1()
@@ -30,7 +30,7 @@
3030
; CHECK2: define amdgpu_kernel void @B
3131

3232
; NOLARGEKERNELS-CHECK0: define internal void @HelperC()
33-
; NOLARGEKERNELS-CHECK0: define amdgpu_kernel void @C
33+
; NOLARGEKERNELS-CHECK0: define weak_odr amdgpu_kernel void @C
3434

3535
; NOLARGEKERNELS-CHECK1: define internal void @large2()
3636
; NOLARGEKERNELS-CHECK1: define internal void @large1()
@@ -88,7 +88,7 @@ define internal void @HelperC() {
8888
ret void
8989
}
9090

91-
define amdgpu_kernel void @C() {
91+
define weak_odr amdgpu_kernel void @C() {
9292
call void @HelperC()
9393
ret void
9494
}

0 commit comments

Comments
 (0)