-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[offload][OpenMP] Re-enable strict modifier for num_threads #163565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit re-enables the support for the strict modifier in num_threads within target regions. The message and severity clauses remain unsupported. When the strict check fails, the program terminates without displaying any meaningful message, but it is still compliant with the OpenMP standard (i.e., when no message clause is specified).
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Kevin Sala Penades (kevinsala) ChangesThis commit re-enables the support for the Patch is 1.81 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163565.diff 16 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 3613b6a143d42..ef1a47554ade2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -922,13 +922,6 @@ void CGOpenMPRuntimeGPU::emitNumThreadsClause(
OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
SourceLocation SeverityLoc, const Expr *Message,
SourceLocation MessageLoc) {
- if (Modifier == OMPC_NUMTHREADS_strict) {
- CGM.getDiags().Report(Loc,
- diag::warn_omp_gpu_unsupported_modifier_for_clause)
- << "strict" << getOpenMPClauseName(OMPC_num_threads);
- return;
- }
-
// Nothing to do.
}
@@ -1236,9 +1229,9 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
if (!CGF.HaveInsertPoint())
return;
- auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond,
- NumThreads](CodeGenFunction &CGF,
- PrePostActionTy &Action) {
+ auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond, NumThreads,
+ NumThreadsModifier](CodeGenFunction &CGF,
+ PrePostActionTy &Action) {
CGBuilderTy &Bld = CGF.Builder;
llvm::Value *NumThreadsVal = NumThreads;
llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
@@ -1289,21 +1282,20 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
NumThreadsVal = Bld.CreateZExtOrTrunc(NumThreadsVal, CGF.Int32Ty);
assert(IfCondVal && "Expected a value");
+ RuntimeFunction FnID = OMPRTL___kmpc_parallel_51;
llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
- llvm::Value *Args[] = {
- RTLoc,
- getThreadID(CGF, Loc),
- IfCondVal,
- NumThreadsVal,
- llvm::ConstantInt::get(CGF.Int32Ty, -1),
- FnPtr,
- ID,
- Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
- CGF.VoidPtrPtrTy),
- llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())};
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_parallel_51),
- Args);
+ llvm::SmallVector<llvm::Value *, 10> Args(
+ {RTLoc, getThreadID(CGF, Loc), IfCondVal, NumThreadsVal,
+ llvm::ConstantInt::get(CGF.Int32Ty, -1), FnPtr, ID,
+ Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
+ CGF.VoidPtrPtrTy),
+ llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())});
+ if (NumThreadsModifier == OMPC_NUMTHREADS_strict) {
+ FnID = OMPRTL___kmpc_parallel_60;
+ Args.append({llvm::ConstantInt::get(CGM.Int32Ty, true)});
+ }
+ CGF.EmitRuntimeCall(
+ OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
};
RegionCodeGenTy RCG(ParallelGen);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 3a7ee5456a9d2..719ddf43b5cef 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -245,7 +245,7 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
/// \param NumThreads The value corresponding to the num_threads clause, if
/// any, or nullptr.
/// \param NumThreadsModifier The modifier of the num_threads clause, if
- /// any, ignored otherwise. Currently unused on the device.
+ /// any, ignored otherwise.
/// \param Severity The severity corresponding to the num_threads clause, if
/// any, ignored otherwise. Currently unused on the device.
/// \param Message The message string corresponding to the num_threads clause,
diff --git a/clang/test/AST/ByteCode/openmp.cpp b/clang/test/AST/ByteCode/openmp.cpp
index c7cccfdd1de9a..61a33c8aa8335 100644
--- a/clang/test/AST/ByteCode/openmp.cpp
+++ b/clang/test/AST/ByteCode/openmp.cpp
@@ -17,13 +17,12 @@ extern int omp_get_thread_num(void);
int test2() {
int x = 0;
- int result[N] = {0};
+ int device_result[N] = {0};
- #pragma omp parallel loop num_threads(strict: N) severity(warning) message("msg")
+ #pragma omp target parallel loop num_threads(strict: N)
for (int i = 0; i < N; i++) {
x = omp_get_thread_num();
- result[i] = i + x;
+ device_result[i] = i + x;
}
}
-
diff --git a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
deleted file mode 100644
index 513754b0bbad9..0000000000000
--- a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
+++ /dev/null
@@ -1,108 +0,0 @@
-// RUN: %clang_cc1 -DF1 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF1 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF2 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF2 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF3 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF3 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-
-#ifndef TARGET
-// expected-no-diagnostics
-#endif
-
-#ifdef F3
-template<typename tx>
-tx ftemplate(int n) {
- tx a = 0;
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: tx(20)) severity(fatal) message("msg")
- {
- }
-
- short b = 1;
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: b) severity(warning) message("msg")
- {
- a += b;
- }
-
- return a;
-}
-#endif
-
-#ifdef F2
-static
-int fstatic(int n) {
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp target parallel num_threads(strict: n) message("msg")
- {
- }
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp target parallel num_threads(strict: 32+n) severity(warning)
- {
- }
-
- return n+1;
-}
-#endif
-
-#ifdef F1
-struct S1 {
- double a;
-
- int r1(int n){
- int b = 1;
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: n-b) severity(warning) message("msg")
- {
- this->a = (double)b + 1.5;
- }
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: 1024) severity(fatal)
- {
- this->a = 2.5;
- }
-
- return (int)a;
- }
-};
-#endif
-
-int bar(int n){
- int a = 0;
-
-#ifdef F1
- #pragma omp target
- {
- S1 S;
- a += S.r1(n);
- }
-#endif
-
-#ifdef F2
- a += fstatic(n);
-#endif
-
-#ifdef F3
- #pragma omp target
- a += ftemplate<int>(n);
-#endif
-
- return a;
-}
diff --git a/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
new file mode 100644
index 0000000000000..5070b56f101bc
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
@@ -0,0 +1,1067 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_1
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+template<typename tx>
+tx ftemplate(int n) {
+ tx a = 0;
+ short aa = 0;
+ tx b[10];
+
+ #pragma omp target parallel map(tofrom: aa) num_threads(1024)
+ {
+ aa += 1;
+ }
+ #ifdef OMP60
+ #pragma omp target parallel map(tofrom: aa) num_threads(strict: 1024)
+ {
+ aa += 1;
+ }
+ #endif
+
+ #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(n)
+ {
+ a += 1;
+ aa += 1;
+ b[2] += 1;
+ }
+ #ifdef OMP60
+ #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(strict: n)
+ {
+ a += 1;
+ aa += 1;
+ b[2] += 1;
+ }
+ #endif
+
+ return a;
+}
+
+int bar(int n){
+ int a = 0;
+
+ a += ftemplate<int>(n);
+
+ return a;
+}
+
+#endif
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT: [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT: store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_1-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1: user_code.entry:
+// OMP45_1-NEXT: [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr))
+// OMP45_1-NEXT: [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT: store ptr [[TMP0]], ptr [[TMP3]], align 8
+// OMP45_1-NEXT: call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP2]], i32 1, i32 1024, i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 1)
+// OMP45_1-NEXT: call void @__kmpc_target_deinit()
+// OMP45_1-NEXT: ret void
+// OMP45_1: worker.exit:
+// OMP45_1-NEXT: ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR1:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT: [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// OMP45_1-NEXT: [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+// OMP45_1-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
+// OMP45_1-NEXT: store i16 [[CONV1]], ptr [[TMP0]], align 2
+// OMP45_1-NEXT: ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [3 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT: [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT: [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT: [[DOTCAPTURE_EXPR__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR__ADDR]] to ptr
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT: store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9:![0-9]+]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT: [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT: [[TMP3:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP3]], -1
+// OMP45_1-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1: user_code.entry:
+// OMP45_1-NEXT: [[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// OMP45_1-NEXT: [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 4
+// OMP45_1-NEXT: [[TMP6:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT: store ptr [[TMP0]], ptr [[TMP6]], align 8
+// OMP45_1-NEXT: [[TMP7:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 1
+// OMP45_1-NEXT: store ptr [[TMP1]], ptr [[TMP7]], align 8
+// OMP45_1-NEXT: [[TMP8:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 2
+// OMP45_1-NEXT: store ptr [[TMP2]], ptr [[TMP8]], align 8
+// OMP45_1-NEXT: call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP4]], i32 1, i32 [[TMP5]], i32 -1, ptr @{{__omp_offloading_[0-9a-z...
[truncated]
|
@llvm/pr-subscribers-flang-openmp Author: Kevin Sala Penades (kevinsala) ChangesThis commit re-enables the support for the Patch is 1.81 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163565.diff 16 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 3613b6a143d42..ef1a47554ade2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -922,13 +922,6 @@ void CGOpenMPRuntimeGPU::emitNumThreadsClause(
OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
SourceLocation SeverityLoc, const Expr *Message,
SourceLocation MessageLoc) {
- if (Modifier == OMPC_NUMTHREADS_strict) {
- CGM.getDiags().Report(Loc,
- diag::warn_omp_gpu_unsupported_modifier_for_clause)
- << "strict" << getOpenMPClauseName(OMPC_num_threads);
- return;
- }
-
// Nothing to do.
}
@@ -1236,9 +1229,9 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
if (!CGF.HaveInsertPoint())
return;
- auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond,
- NumThreads](CodeGenFunction &CGF,
- PrePostActionTy &Action) {
+ auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond, NumThreads,
+ NumThreadsModifier](CodeGenFunction &CGF,
+ PrePostActionTy &Action) {
CGBuilderTy &Bld = CGF.Builder;
llvm::Value *NumThreadsVal = NumThreads;
llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
@@ -1289,21 +1282,20 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
NumThreadsVal = Bld.CreateZExtOrTrunc(NumThreadsVal, CGF.Int32Ty);
assert(IfCondVal && "Expected a value");
+ RuntimeFunction FnID = OMPRTL___kmpc_parallel_51;
llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
- llvm::Value *Args[] = {
- RTLoc,
- getThreadID(CGF, Loc),
- IfCondVal,
- NumThreadsVal,
- llvm::ConstantInt::get(CGF.Int32Ty, -1),
- FnPtr,
- ID,
- Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
- CGF.VoidPtrPtrTy),
- llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())};
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_parallel_51),
- Args);
+ llvm::SmallVector<llvm::Value *, 10> Args(
+ {RTLoc, getThreadID(CGF, Loc), IfCondVal, NumThreadsVal,
+ llvm::ConstantInt::get(CGF.Int32Ty, -1), FnPtr, ID,
+ Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
+ CGF.VoidPtrPtrTy),
+ llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())});
+ if (NumThreadsModifier == OMPC_NUMTHREADS_strict) {
+ FnID = OMPRTL___kmpc_parallel_60;
+ Args.append({llvm::ConstantInt::get(CGM.Int32Ty, true)});
+ }
+ CGF.EmitRuntimeCall(
+ OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
};
RegionCodeGenTy RCG(ParallelGen);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 3a7ee5456a9d2..719ddf43b5cef 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -245,7 +245,7 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
/// \param NumThreads The value corresponding to the num_threads clause, if
/// any, or nullptr.
/// \param NumThreadsModifier The modifier of the num_threads clause, if
- /// any, ignored otherwise. Currently unused on the device.
+ /// any, ignored otherwise.
/// \param Severity The severity corresponding to the num_threads clause, if
/// any, ignored otherwise. Currently unused on the device.
/// \param Message The message string corresponding to the num_threads clause,
diff --git a/clang/test/AST/ByteCode/openmp.cpp b/clang/test/AST/ByteCode/openmp.cpp
index c7cccfdd1de9a..61a33c8aa8335 100644
--- a/clang/test/AST/ByteCode/openmp.cpp
+++ b/clang/test/AST/ByteCode/openmp.cpp
@@ -17,13 +17,12 @@ extern int omp_get_thread_num(void);
int test2() {
int x = 0;
- int result[N] = {0};
+ int device_result[N] = {0};
- #pragma omp parallel loop num_threads(strict: N) severity(warning) message("msg")
+ #pragma omp target parallel loop num_threads(strict: N)
for (int i = 0; i < N; i++) {
x = omp_get_thread_num();
- result[i] = i + x;
+ device_result[i] = i + x;
}
}
-
diff --git a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
deleted file mode 100644
index 513754b0bbad9..0000000000000
--- a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
+++ /dev/null
@@ -1,108 +0,0 @@
-// RUN: %clang_cc1 -DF1 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF1 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF2 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF2 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF3 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF3 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-
-#ifndef TARGET
-// expected-no-diagnostics
-#endif
-
-#ifdef F3
-template<typename tx>
-tx ftemplate(int n) {
- tx a = 0;
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: tx(20)) severity(fatal) message("msg")
- {
- }
-
- short b = 1;
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: b) severity(warning) message("msg")
- {
- a += b;
- }
-
- return a;
-}
-#endif
-
-#ifdef F2
-static
-int fstatic(int n) {
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp target parallel num_threads(strict: n) message("msg")
- {
- }
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp target parallel num_threads(strict: 32+n) severity(warning)
- {
- }
-
- return n+1;
-}
-#endif
-
-#ifdef F1
-struct S1 {
- double a;
-
- int r1(int n){
- int b = 1;
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: n-b) severity(warning) message("msg")
- {
- this->a = (double)b + 1.5;
- }
-
-#ifdef TARGET
- // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
- #pragma omp parallel num_threads(strict: 1024) severity(fatal)
- {
- this->a = 2.5;
- }
-
- return (int)a;
- }
-};
-#endif
-
-int bar(int n){
- int a = 0;
-
-#ifdef F1
- #pragma omp target
- {
- S1 S;
- a += S.r1(n);
- }
-#endif
-
-#ifdef F2
- a += fstatic(n);
-#endif
-
-#ifdef F3
- #pragma omp target
- a += ftemplate<int>(n);
-#endif
-
- return a;
-}
diff --git a/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
new file mode 100644
index 0000000000000..5070b56f101bc
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
@@ -0,0 +1,1067 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_1
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+template<typename tx>
+tx ftemplate(int n) {
+ tx a = 0;
+ short aa = 0;
+ tx b[10];
+
+ #pragma omp target parallel map(tofrom: aa) num_threads(1024)
+ {
+ aa += 1;
+ }
+ #ifdef OMP60
+ #pragma omp target parallel map(tofrom: aa) num_threads(strict: 1024)
+ {
+ aa += 1;
+ }
+ #endif
+
+ #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(n)
+ {
+ a += 1;
+ aa += 1;
+ b[2] += 1;
+ }
+ #ifdef OMP60
+ #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(strict: n)
+ {
+ a += 1;
+ aa += 1;
+ b[2] += 1;
+ }
+ #endif
+
+ return a;
+}
+
+int bar(int n){
+ int a = 0;
+
+ a += ftemplate<int>(n);
+
+ return a;
+}
+
+#endif
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT: [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT: store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_1-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1: user_code.entry:
+// OMP45_1-NEXT: [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr))
+// OMP45_1-NEXT: [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT: store ptr [[TMP0]], ptr [[TMP3]], align 8
+// OMP45_1-NEXT: call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP2]], i32 1, i32 1024, i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 1)
+// OMP45_1-NEXT: call void @__kmpc_target_deinit()
+// OMP45_1-NEXT: ret void
+// OMP45_1: worker.exit:
+// OMP45_1-NEXT: ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR1:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT: [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// OMP45_1-NEXT: [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+// OMP45_1-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
+// OMP45_1-NEXT: store i16 [[CONV1]], ptr [[TMP0]], align 2
+// OMP45_1-NEXT: ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
+// OMP45_1-NEXT: entry:
+// OMP45_1-NEXT: [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT: [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS:%.*]] = alloca [3 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT: [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT: [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT: [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT: [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT: [[DOTCAPTURE_EXPR__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR__ADDR]] to ptr
+// OMP45_1-NEXT: [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT: store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT: [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9:![0-9]+]]
+// OMP45_1-NEXT: [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT: [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT: [[TMP3:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT: [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP3]], -1
+// OMP45_1-NEXT: br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1: user_code.entry:
+// OMP45_1-NEXT: [[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// OMP45_1-NEXT: [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 4
+// OMP45_1-NEXT: [[TMP6:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT: store ptr [[TMP0]], ptr [[TMP6]], align 8
+// OMP45_1-NEXT: [[TMP7:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 1
+// OMP45_1-NEXT: store ptr [[TMP1]], ptr [[TMP7]], align 8
+// OMP45_1-NEXT: [[TMP8:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 2
+// OMP45_1-NEXT: store ptr [[TMP2]], ptr [[TMP8]], align 8
+// OMP45_1-NEXT: call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP4]], i32 1, i32 [[TMP5]], i32 -1, ptr @{{__omp_offloading_[0-9a-z...
[truncated]
|
} | ||
|
||
[[clang::always_inline]] void | ||
__kmpc_parallel_60(IdentTy *ident, int32_t id, int32_t if_expr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't the original issue that OpenMPOpt didn't recognize this anymore? I remember some complaints about the indirection. Not sure we need the old _51
anymore since we assume no backwards compat in device code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these wrappers are a no-no in general. I only introduced them as a temporary solution for a short time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, seems that OpenMPOPT handles __kmpc_parallel_51
but not __kmpc_parallel_60
. The only difference is the strict modifier. Do you think it's ok to still use __kmpc_parallel_51
and use the last parameter nt_strict
with default value false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without further ado, that's why I had introduced the wrappers as a workaround, see #156104
Just crashing the program does not seem very user-friendly to me, tbh. The user / dev would then need to debug to check why the program crashed. They could just debug the number of threads without the strict thing if they have to debug anyway. |
IMHO if we want to add this back, we will need to fix OpenMPOpt first or along side such that we'd not introduce these many wrappers. |
This PR re-enables the support for the
strict
modifier innum_threads
withintarget
regions. Themessage
andseverity
clauses remain unsupported. When the strict check fails, the program terminates without displaying any meaningful message, but it is still compliant with the OpenMP standard (i.e., when nomessage
clause is specified).The motivation behind this PR is to re-enable the strict check when requesting threads without incurring in the cost of supporting error messages or the warning severity (yet).
This is a partial revert of #157893 and #160659.