-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation #164326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…when aliasee is no declared on device Add checks to skip the emitting of an alias on the device when the aliasee is not declared on the device. This change effects OpenMP, Hip and Cuda.
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (Jason-VanBeusekom) ChangesThis PR adds checks for when emitting weak aliases in: For OpenMP the following case now works: > Failed before when compiling with device, ie: For HIP / Cuda: > For HIP: >Failed before on HIP, Cuda fails due to: Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang. Fixes #117369 Full diff: https://github.com/llvm/llvm-project/pull/164326.diff 5 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c5eb14e329315..ac0de5a221ec7 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4065,8 +4065,40 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// If this is an alias definition (which otherwise looks like a declaration)
// emit it now.
- if (Global->hasAttr<AliasAttr>())
+ if (Global->hasAttr<AliasAttr>()) {
+ if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) {
+ const auto *AA = Global->getAttr<AliasAttr>();
+ assert(AA && "Not an alias?");
+ GlobalDecl AliaseeGD;
+ if (!lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
+ if (LangOpts.CUDA)
+ // Failed to find aliasee on device side, skip emitting
+ return;
+ } else {
+ const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
+ if (LangOpts.OpenMPIsTargetDevice) {
+ if (!AliaseeDecl ||
+ !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(
+ AliaseeDecl))
+ // Not a target declaration, skip emitting
+ return;
+ } else {
+ // HIP/CUDA
+ const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
+ const bool AliaseeHasDeviceAttr =
+ AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();
+ if (LangOpts.CUDAIsDevice) {
+ if (!HasDeviceAttr || !AliaseeHasDeviceAttr)
+ return;
+ } else if (HasDeviceAttr && AliaseeHasDeviceAttr) {
+ // Alias is only on device side, skip emitting on host side
+ return;
+ }
+ }
+ }
+ }
return EmitAliasDefinition(GD);
+ }
// IFunc like an alias whose value is resolved at runtime by calling resolver.
if (Global->hasAttr<IFuncAttr>())
diff --git a/clang/test/CodeGenCUDA/cuda_weak_alias.cu b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
new file mode 100644
index 0000000000000..fda0ed7e5d74b
--- /dev/null
+++ b/clang/test/CodeGenCUDA/cuda_weak_alias.cu
@@ -0,0 +1,36 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+// RUN: %clang_cc1 -x cuda -triple x86_64-unknown-linux-gnu -aux-triple nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+
+extern "C" {
+
+//.
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT: ret i32 0
+//
+int main() {
+ return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenHIP/hip_weak_alias.cpp b/clang/test/CodeGenHIP/hip_weak_alias.cpp
new file mode 100644
index 0000000000000..6a57ce1ab74c7
--- /dev/null
+++ b/clang/test/CodeGenHIP/hip_weak_alias.cpp
@@ -0,0 +1,63 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -x hip -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -emit-llvm %s -fcuda-is-device -o - | FileCheck %s --check-prefix=DEVICE
+
+#define __device__ __attribute__((device))
+
+extern "C" {
+
+//.
+// HOST: @__hip_cuid_ = global i8 0
+// HOST: @llvm.compiler.used = appending global [1 x ptr] [ptr @__hip_cuid_], section "llvm.metadata"
+// HOST: @HostFunc = weak alias i32 (), ptr @__HostFunc
+//.
+// DEVICE: @__hip_cuid_ = addrspace(1) global i8 0
+// DEVICE: @llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_ to ptr)], section "llvm.metadata"
+// DEVICE: @One = weak alias i32 (), ptr @__One
+//.
+// HOST-LABEL: define dso_local i32 @__HostFunc(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: ret i32 42
+//
+int __HostFunc(void) { return 42; }
+int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
+
+// DEVICE-LABEL: define dso_local i32 @__One(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT: [[ENTRY:.*:]]
+// DEVICE-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT: ret i32 2
+//
+__device__ int __One(void) { return 2; }
+__device__ int One(void) __attribute__ ((weak, alias("__One")));
+
+}
+
+// HOST-LABEL: define dso_local noundef i32 @main(
+// HOST-SAME: ) #[[ATTR1:[0-9]+]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT: ret i32 0
+//
+int main() {
+ return 0;
+}
+//.
+// HOST: attributes #[[ATTR0]] = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+// HOST: attributes #[[ATTR1]] = { mustprogress noinline norecurse nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
+// DEVICE: [[META2:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META3:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/amdgcn_weak_alias.c b/clang/test/OpenMP/amdgcn_weak_alias.c
new file mode 100644
index 0000000000000..bf8645bef6d78
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_weak_alias.c
@@ -0,0 +1,90 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -o - | FileCheck %s --check-prefix=HOST
+// RUN: %clang_cc1 -fopenmp -x c -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=DEVICE
+
+//.
+// HOST: @One = weak alias i32 (), ptr @__One
+// HOST: @Two = weak alias i32 (), ptr @__Two
+// HOST: @Three = weak alias i32 (), ptr @__Three
+//.
+// DEVICE: @__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0
+// DEVICE: @Two = weak hidden alias i32 (), ptr @__Two
+// DEVICE: @Three = weak hidden alias i32 (), ptr @__Three
+// DEVICE: @Three.1 = weak hidden alias i32 (), ptr @__Three
+//.
+// HOST-LABEL: define dso_local i32 @__One(
+// HOST-SAME: ) #[[ATTR0:[0-9]+]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Two(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: ret i32 2
+//
+// DEVICE-LABEL: define hidden i32 @__Two(
+// DEVICE-SAME: ) #[[ATTR0:[0-9]+]] {
+// DEVICE-NEXT: [[ENTRY:.*:]]
+// DEVICE-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT: ret i32 2
+//
+int __Two(void) { return 2; }
+int Two(void) __attribute__ ((weak, alias("__Two")));
+#pragma omp end declare target
+
+#pragma omp declare target
+// HOST-LABEL: define dso_local i32 @__Three(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: ret i32 3
+//
+// DEVICE-LABEL: define hidden i32 @__Three(
+// DEVICE-SAME: ) #[[ATTR0]] {
+// DEVICE-NEXT: [[ENTRY:.*:]]
+// DEVICE-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// DEVICE-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// DEVICE-NEXT: ret i32 3
+//
+int __Three(void) { return 3; }
+#pragma omp end declare target
+int Three(void) __attribute__ ((weak, alias("__Three")));
+
+
+// HOST-LABEL: define dso_local i32 @main(
+// HOST-SAME: ) #[[ATTR0]] {
+// HOST-NEXT: [[ENTRY:.*:]]
+// HOST-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// HOST-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// HOST-NEXT: ret i32 0
+//
+int main(){
+ return 0;
+}
+
+//.
+// HOST: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// DEVICE: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// HOST: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// HOST: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// HOST: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
+// DEVICE: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// DEVICE: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// DEVICE: [[META2:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// DEVICE: [[META3:![0-9]+]] = !{i32 7, !"openmp-device", i32 51}
+// DEVICE: [[META4:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/OpenMP/nvptx_weak_alias.c b/clang/test/OpenMP/nvptx_weak_alias.c
new file mode 100644
index 0000000000000..695bd7d0b8af9
--- /dev/null
+++ b/clang/test/OpenMP/nvptx_weak_alias.c
@@ -0,0 +1,34 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -o - | FileCheck %s
+
+//.
+// CHECK: @One = weak alias i32 (), ptr @__One
+//.
+// CHECK-LABEL: define dso_local i32 @__One(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: ret i32 1
+//
+int __One(void) { return 1; }
+int One(void) __attribute__ ((weak, alias("__One")));
+
+
+// CHECK-LABEL: define dso_local i32 @main(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// CHECK-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// CHECK-NEXT: ret i32 0
+//
+int main(){
+ return 0;
+}
+//.
+// CHECK: attributes #[[ATTR0]] = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{i32 7, !"openmp", i32 51}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
|
|
@jhuber6 @alexey-bataev, anyone else I should ping? |
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (LangOpts.CUDAIsDevice) { | ||
| if (!HasDeviceAttr || !AliaseeHasDeviceAttr) | ||
| return; | ||
| } else if (HasDeviceAttr && AliaseeHasDeviceAttr) { | ||
| // Alias is only on device side, skip emitting on host side | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is up with the logic nesting here, we have an if, nesed if, and an else if, that all do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed as LangOpts.CUDA is true for both the host and device compilation, for the host compilation case (the else if) we need to skip if the alias / Aliasee is only on the device, and for the device side we need to make sure both the alias and aliasee are both on the device.
in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (LangOpts.CUDA) | ||
| // Failed to find aliasee on device side, skip emitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hard error on CUDA we should honestly reject it at Sema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangOpts.CUDA is true for both HIP and CUDA cases for both host and device compilation, this is not a hard error as this is triggered during host compilation for a device only alias and during device compilation for a host only alais
in 16c1a68 I refactored this and added comments to hopefully clear this up. I also added a comment below going into depth about the various states in the refactor and when they are triggered.
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (Global->hasAttr<AliasAttr>()) { | ||
| if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) { | ||
| const auto *AA = Global->getAttr<AliasAttr>(); | ||
| assert(AA && "Not an alias?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we asserting something we know for a fact is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in:16c1a68
I copied this from EmitAliasDefinition which does not check with Global->hasAttr<AliasAttr>()
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| // emit it now. | ||
| if (Global->hasAttr<AliasAttr>()) | ||
| if (Global->hasAttr<AliasAttr>()) { | ||
| if (LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this should be a helper function that returns a bool most likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor done in: 16c1a68
| // HOST-LABEL: define dso_local noundef i32 @main( | ||
| // HOST-SAME: ) #[[ATTR1:[0-9]+]] { | ||
| // HOST-NEXT: [[ENTRY:.*:]] | ||
| // HOST-NEXT: [[RETVAL:%.*]] = alloca i32, align 4 | ||
| // HOST-NEXT: store i32 0, ptr [[RETVAL]], align 4 | ||
| // HOST-NEXT: ret i32 0 | ||
| // | ||
| int main() { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in: 16c1a68
I'm still new to LIT tests, did not know main() was not needed.
| // | ||
| __device__ int __One(void) { return 2; } | ||
| __device__ int One(void) __attribute__ ((weak, alias("__One"))); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add lit tests for:
- aliasee being a host device function
- aliasee being a constexpr function
- aliasee being one of an overloaded device functions aliased by mangled name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the cases for HIP and amdgcn for OpenMP in 16c1a68,
The cases work without modification of the patch.
|
can we add a section "Alias Attribute Support" to https://clang.llvm.org/docs/HIPSupport.html, briefly describe the usecase of alias attribute in HIP with some examples? |
|
16c1a68 Addresses the above feedback (minus documentation). I refactored the code to be more readable about the states, while, also, reducing the nested conditionals and added comments to describe the states. Below I wrote a modified version of Of note: static bool shouldSkipAliasEmission(const CodeGenModule &CGM,
const ValueDecl *Global) {
const LangOptions &LangOpts = CGM.getLangOpts();
if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA)) {
printf("Not cuda / openmp target returning false\n");
return false;
}
const auto *AA = Global->getAttr<AliasAttr>();
GlobalDecl AliaseeGD;
// Check if the aliasee exists.
if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) {
if (LangOpts.CUDA) {
printf("HIP / CUDA aliasse not found returning true\n");
// In CUDA/HIP, if the aliasee is not found, skip the alias emission.
// This is not a hard error as this branch is executed for both the host
// and device, with no respect to where the aliasee is defined.
return true;
}
printf("OpenMP aliasee not found error\n");
// For OpenMP, lookupRepresentativeDecl should always find the aliasee, this
// is an error
CGM.getDiags().Report(AA->getLocation(), diag::err_alias_to_undefined)
<< false << true;
return false;
}
const auto *AliaseeDecl = dyn_cast<ValueDecl>(AliaseeGD.getDecl());
if (LangOpts.OpenMPIsTargetDevice) {
if (!AliaseeDecl ||
!OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl)) {
printf("OpenMP aliasse not target decl returning true\n");
return true;
}
printf("OpenMP aliasse is target decl returning false\n");
return false;
}
// CUDA / HIP
const bool HasDeviceAttr = Global->hasAttr<CUDADeviceAttr>();
const bool AliaseeHasDeviceAttr =
AliaseeDecl && AliaseeDecl->hasAttr<CUDADeviceAttr>();
if (LangOpts.CUDAIsDevice) {
if (!HasDeviceAttr || !AliaseeHasDeviceAttr) {
// On device, skip alias emission if either the alias or the aliasee
// is not marked with __device__.
printf(
"Skipping HIP / CUDA alias emission on device for host only alias\n");
return true;
}
printf("Emitting HIP / CUDA alias on device\n");
return false;
}
printf("Emitting HIP / CUDA alias on host\n");
// CUDA / HIP Host
// we know that the aliasee exists from above, so we know to emit
return false;
}And prints out the following for each case: OpenMP:#pragma omp declare target
int __One(void) { return 1; }
#pragma omp end declare target
int One(void) __attribute__ ((weak, alias("__One")));Outputs:#pragma omp declare target
int __One(void) { return 1; }
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare targetOutputs:int __One(void) { return 1; }
int One(void) __attribute__ ((weak, alias("__One")));Outputs:int __One(void) { return 1; }
#pragma omp declare target
int One(void) __attribute__ ((weak, alias("__One")));
#pragma omp end declare targetOutputs:Results in runtime error (expected):
HIP:Outputs:int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));Outputs:int __DevTwo(void) { return 2; }
__device__ int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));Outputs:(compiles with no error, calling aliasee on host works, calling alias on device results in compiler error calling alaisee on host results in compiler error, calling on device results in runtime error [memory access fault]) we could add check to have error here, but not sure if we want to__device__ int __DevTwo(void) { return 2; }
int DevTwo(void) __attribute__ ((weak, alias("__DevTwo")));Outputs: |
jhuber6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nits
@Artem-B CUDA does not support weak on anything but non-kernel functions. Should we handle that in Sema or just let the backend die later.
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| // Check if the aliasee exists. | ||
| if (!CGM.lookupRepresentativeDecl(AA->getAliasee(), AliaseeGD)) { | ||
| // If the aliasee is not found, skip the alias emission. | ||
| // This is not a hard error as this branch is executed for both the host | ||
| // and device, with no respect to where the aliasee is defined. | ||
| // For some OpenMP cases (functions) this will return true even if the | ||
| // aliasee is not on the device, which is handled by the case below | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No braces around single block, make the comment less verbose and put it above the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2949485
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA)) | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!(LangOpts.OpenMPIsTargetDevice || LangOpts.CUDA)) | |
| return false; | |
| if (!LangOpts.OpenMPIsTargetDevice && !LangOpts.CUDA)) | |
| return false; |
Clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2949485
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (!AliaseeDecl || | ||
| !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl)) | ||
| // On OpenMP device, skip alias emission if the aliasee is not marked | ||
| // with declare target. | ||
| return true; | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!AliaseeDecl || | |
| !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl)) | |
| // On OpenMP device, skip alias emission if the aliasee is not marked | |
| // with declare target. | |
| return true; | |
| return false; | |
| return !AliaseeDecl || !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(AliaseeDecl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2949485
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
| if (LangOpts.CUDAIsDevice) { | ||
| if (!HasDeviceAttr || !AliaseeHasDeviceAttr) | ||
| // On device, skip alias emission if either the alias or the aliasee | ||
| // is not marked with __device__. | ||
| return true; | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (LangOpts.CUDAIsDevice) { | |
| if (!HasDeviceAttr || !AliaseeHasDeviceAttr) | |
| // On device, skip alias emission if either the alias or the aliasee | |
| // is not marked with __device__. | |
| return true; | |
| return false; | |
| } | |
| if (LangOpts.CUDAIsDevice) | |
| return !HasDeviceAttr || !AliaseeHasDeviceAttr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2949485
| @@ -0,0 +1,24 @@ | |||
| // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 6 | |||
| // REQUIRES: nvptx-registered-target | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // REQUIRES: nvptx-registered-target |
This shouldn't be necessary since we're not hitting PTX codegen, here and elsewhere. Also I know these tests are autogenerated, but we don't need to check the attributes. Totally find to trim things from the autogenerated output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trimmed up the checks, in 2949485, I kept the function definitions as they technically relate to the issue, but I also see an argument to also removing those.
This PR adds checks for when emitting weak aliases in:
void CodeGenModule::EmitGlobal(GlobalDecl GD), before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation.For OpenMP the following case now works:
For HIP / Cuda:
For HIP:
Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang.
Fixes #117369