Skip to content

Conversation

kevinsala
Copy link
Contributor

@kevinsala kevinsala commented Oct 15, 2025

This PR re-enables the support for the strict modifier in num_threads within target regions. The message and severity clauses remain unsupported. When the strict check fails, the program terminates without displaying any meaningful message, but it is still compliant with the OpenMP standard (i.e., when no message clause is specified).

The motivation behind this PR is to re-enable the strict check when requesting threads without incurring in the cost of supporting error messages or the warning severity (yet).

This is a partial revert of #157893 and #160659.

This commit re-enables the support for the strict modifier in num_threads
within target regions. The message and severity clauses remain unsupported.
When the strict check fails, the program terminates without displaying any
meaningful message, but it is still compliant with the OpenMP standard (i.e.,
when no message clause is specified).
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. flang:openmp clang:openmp OpenMP related changes to Clang openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime clang:bytecode Issues for the clang bytecode constexpr interpreter labels Oct 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 15, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Kevin Sala Penades (kevinsala)

Changes

This commit re-enables the support for the strict modifier in num_threads within target regions. The message and severity clauses remain unsupported. When the strict check fails, the program terminates without displaying any meaningful message, but it is still compliant with the OpenMP standard (i.e., when no message clause is specified).


Patch is 1.81 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163565.diff

16 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+16-24)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+1-1)
  • (modified) clang/test/AST/ByteCode/openmp.cpp (+3-4)
  • (removed) clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp (-108)
  • (added) clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp (+1067)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+5803-5676)
  • (added) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+3417)
  • (removed) clang/test/OpenMP/nvptx_parallel_num_threads_strict_messages.cpp (-108)
  • (modified) clang/test/OpenMP/nvptx_target_codegen.cpp (+847-66)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp (+703-35)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen.cpp (+383-18)
  • (modified) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1889-1263)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_strict_codegen.cpp (+1342)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+1789)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+6)
  • (modified) openmp/device/src/Parallelism.cpp (+64-17)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 3613b6a143d42..ef1a47554ade2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -922,13 +922,6 @@ void CGOpenMPRuntimeGPU::emitNumThreadsClause(
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
     SourceLocation SeverityLoc, const Expr *Message,
     SourceLocation MessageLoc) {
-  if (Modifier == OMPC_NUMTHREADS_strict) {
-    CGM.getDiags().Report(Loc,
-                          diag::warn_omp_gpu_unsupported_modifier_for_clause)
-        << "strict" << getOpenMPClauseName(OMPC_num_threads);
-    return;
-  }
-
   // Nothing to do.
 }
 
@@ -1236,9 +1229,9 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
   if (!CGF.HaveInsertPoint())
     return;
 
-  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond,
-                        NumThreads](CodeGenFunction &CGF,
-                                    PrePostActionTy &Action) {
+  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond, NumThreads,
+                        NumThreadsModifier](CodeGenFunction &CGF,
+                                            PrePostActionTy &Action) {
     CGBuilderTy &Bld = CGF.Builder;
     llvm::Value *NumThreadsVal = NumThreads;
     llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
@@ -1289,21 +1282,20 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
       NumThreadsVal = Bld.CreateZExtOrTrunc(NumThreadsVal, CGF.Int32Ty);
 
     assert(IfCondVal && "Expected a value");
+    RuntimeFunction FnID = OMPRTL___kmpc_parallel_51;
     llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
-    llvm::Value *Args[] = {
-        RTLoc,
-        getThreadID(CGF, Loc),
-        IfCondVal,
-        NumThreadsVal,
-        llvm::ConstantInt::get(CGF.Int32Ty, -1),
-        FnPtr,
-        ID,
-        Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
-                                   CGF.VoidPtrPtrTy),
-        llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())};
-    CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
-                            CGM.getModule(), OMPRTL___kmpc_parallel_51),
-                        Args);
+    llvm::SmallVector<llvm::Value *, 10> Args(
+        {RTLoc, getThreadID(CGF, Loc), IfCondVal, NumThreadsVal,
+         llvm::ConstantInt::get(CGF.Int32Ty, -1), FnPtr, ID,
+         Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
+                                    CGF.VoidPtrPtrTy),
+         llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())});
+    if (NumThreadsModifier == OMPC_NUMTHREADS_strict) {
+      FnID = OMPRTL___kmpc_parallel_60;
+      Args.append({llvm::ConstantInt::get(CGM.Int32Ty, true)});
+    }
+    CGF.EmitRuntimeCall(
+        OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
   };
 
   RegionCodeGenTy RCG(ParallelGen);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 3a7ee5456a9d2..719ddf43b5cef 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -245,7 +245,7 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
   /// \param NumThreads The value corresponding to the num_threads clause, if
   /// any, or nullptr.
   /// \param NumThreadsModifier The modifier of the num_threads clause, if
-  /// any, ignored otherwise. Currently unused on the device.
+  /// any, ignored otherwise.
   /// \param Severity The severity corresponding to the num_threads clause, if
   /// any, ignored otherwise. Currently unused on the device.
   /// \param Message The message string corresponding to the num_threads clause,
diff --git a/clang/test/AST/ByteCode/openmp.cpp b/clang/test/AST/ByteCode/openmp.cpp
index c7cccfdd1de9a..61a33c8aa8335 100644
--- a/clang/test/AST/ByteCode/openmp.cpp
+++ b/clang/test/AST/ByteCode/openmp.cpp
@@ -17,13 +17,12 @@ extern int omp_get_thread_num(void);
 
 int test2() {
   int x = 0;
-  int result[N] = {0};
+  int device_result[N] = {0};
 
-  #pragma omp parallel loop num_threads(strict: N) severity(warning) message("msg")
+  #pragma omp target parallel loop num_threads(strict: N)
   for (int i = 0; i < N; i++) {
     x = omp_get_thread_num();
-    result[i] = i + x;
+    device_result[i] = i + x;
   }
 }
 
-
diff --git a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
deleted file mode 100644
index 513754b0bbad9..0000000000000
--- a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
+++ /dev/null
@@ -1,108 +0,0 @@
-// RUN: %clang_cc1 -DF1 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF1 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF2 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF2 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF3 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF3 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-
-#ifndef TARGET
-// expected-no-diagnostics
-#endif
-
-#ifdef F3
-template<typename tx>
-tx ftemplate(int n) {
-  tx a = 0;
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp parallel num_threads(strict: tx(20)) severity(fatal) message("msg")
-  {
-  }
-
-  short b = 1;
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp parallel num_threads(strict: b) severity(warning) message("msg")
-  {
-    a += b;
-  }
-
-  return a;
-}
-#endif
-
-#ifdef F2
-static
-int fstatic(int n) {
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp target parallel num_threads(strict: n) message("msg")
-  {
-  }
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp target parallel num_threads(strict: 32+n) severity(warning)
-  {
-  }
-
-  return n+1;
-}
-#endif
-
-#ifdef F1
-struct S1 {
-  double a;
-
-  int r1(int n){
-    int b = 1;
-
-#ifdef TARGET
-    // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-    #pragma omp parallel num_threads(strict: n-b) severity(warning) message("msg")
-    {
-      this->a = (double)b + 1.5;
-    }
-
-#ifdef TARGET
-    // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-    #pragma omp parallel num_threads(strict: 1024) severity(fatal)
-    {
-      this->a = 2.5;
-    }
-
-    return (int)a;
-  }
-};
-#endif
-
-int bar(int n){
-  int a = 0;
-
-#ifdef F1
-  #pragma omp target
-  {
-    S1 S;
-    a += S.r1(n);
-  }
-#endif
-
-#ifdef F2
-  a += fstatic(n);
-#endif
-
-#ifdef F3
-  #pragma omp target
-  a += ftemplate<int>(n);
-#endif
-
-  return a;
-}
diff --git a/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
new file mode 100644
index 0000000000000..5070b56f101bc
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
@@ -0,0 +1,1067 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_1
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+template<typename tx>
+tx ftemplate(int n) {
+  tx a = 0;
+  short aa = 0;
+  tx b[10];
+
+  #pragma omp target parallel map(tofrom: aa) num_threads(1024)
+  {
+    aa += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom: aa) num_threads(strict: 1024)
+  {
+    aa += 1;
+  }
+  #endif
+
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(strict: n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #endif
+
+  return a;
+}
+
+int bar(int n){
+  int a = 0;
+
+  a += ftemplate<int>(n);
+
+  return a;
+}
+
+#endif
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr))
+// OMP45_1-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP3]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP2]], i32 1, i32 1024, i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 1)
+// OMP45_1-NEXT:    call void @__kmpc_target_deinit()
+// OMP45_1-NEXT:    ret void
+// OMP45_1:       worker.exit:
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR1:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+// OMP45_1-NEXT:    [[ADD:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT:    [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
+// OMP45_1-NEXT:    store i16 [[CONV1]], ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [3 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR__ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT:    [[TMP3:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP3]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// OMP45_1-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 4
+// OMP45_1-NEXT:    [[TMP6:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP6]], align 8
+// OMP45_1-NEXT:    [[TMP7:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 1
+// OMP45_1-NEXT:    store ptr [[TMP1]], ptr [[TMP7]], align 8
+// OMP45_1-NEXT:    [[TMP8:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 2
+// OMP45_1-NEXT:    store ptr [[TMP2]], ptr [[TMP8]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP4]], i32 1, i32 [[TMP5]], i32 -1, ptr @{{__omp_offloading_[0-9a-z...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 15, 2025

@llvm/pr-subscribers-flang-openmp

Author: Kevin Sala Penades (kevinsala)

Changes

This commit re-enables the support for the strict modifier in num_threads within target regions. The message and severity clauses remain unsupported. When the strict check fails, the program terminates without displaying any meaningful message, but it is still compliant with the OpenMP standard (i.e., when no message clause is specified).


Patch is 1.81 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163565.diff

16 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+16-24)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+1-1)
  • (modified) clang/test/AST/ByteCode/openmp.cpp (+3-4)
  • (removed) clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp (-108)
  • (added) clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp (+1067)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+5803-5676)
  • (added) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+3417)
  • (removed) clang/test/OpenMP/nvptx_parallel_num_threads_strict_messages.cpp (-108)
  • (modified) clang/test/OpenMP/nvptx_target_codegen.cpp (+847-66)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp (+703-35)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen.cpp (+383-18)
  • (modified) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1889-1263)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_strict_codegen.cpp (+1342)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+1789)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+6)
  • (modified) openmp/device/src/Parallelism.cpp (+64-17)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 3613b6a143d42..ef1a47554ade2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -922,13 +922,6 @@ void CGOpenMPRuntimeGPU::emitNumThreadsClause(
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
     SourceLocation SeverityLoc, const Expr *Message,
     SourceLocation MessageLoc) {
-  if (Modifier == OMPC_NUMTHREADS_strict) {
-    CGM.getDiags().Report(Loc,
-                          diag::warn_omp_gpu_unsupported_modifier_for_clause)
-        << "strict" << getOpenMPClauseName(OMPC_num_threads);
-    return;
-  }
-
   // Nothing to do.
 }
 
@@ -1236,9 +1229,9 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
   if (!CGF.HaveInsertPoint())
     return;
 
-  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond,
-                        NumThreads](CodeGenFunction &CGF,
-                                    PrePostActionTy &Action) {
+  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond, NumThreads,
+                        NumThreadsModifier](CodeGenFunction &CGF,
+                                            PrePostActionTy &Action) {
     CGBuilderTy &Bld = CGF.Builder;
     llvm::Value *NumThreadsVal = NumThreads;
     llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
@@ -1289,21 +1282,20 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
       NumThreadsVal = Bld.CreateZExtOrTrunc(NumThreadsVal, CGF.Int32Ty);
 
     assert(IfCondVal && "Expected a value");
+    RuntimeFunction FnID = OMPRTL___kmpc_parallel_51;
     llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
-    llvm::Value *Args[] = {
-        RTLoc,
-        getThreadID(CGF, Loc),
-        IfCondVal,
-        NumThreadsVal,
-        llvm::ConstantInt::get(CGF.Int32Ty, -1),
-        FnPtr,
-        ID,
-        Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
-                                   CGF.VoidPtrPtrTy),
-        llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())};
-    CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
-                            CGM.getModule(), OMPRTL___kmpc_parallel_51),
-                        Args);
+    llvm::SmallVector<llvm::Value *, 10> Args(
+        {RTLoc, getThreadID(CGF, Loc), IfCondVal, NumThreadsVal,
+         llvm::ConstantInt::get(CGF.Int32Ty, -1), FnPtr, ID,
+         Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
+                                    CGF.VoidPtrPtrTy),
+         llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())});
+    if (NumThreadsModifier == OMPC_NUMTHREADS_strict) {
+      FnID = OMPRTL___kmpc_parallel_60;
+      Args.append({llvm::ConstantInt::get(CGM.Int32Ty, true)});
+    }
+    CGF.EmitRuntimeCall(
+        OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(), FnID), Args);
   };
 
   RegionCodeGenTy RCG(ParallelGen);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 3a7ee5456a9d2..719ddf43b5cef 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -245,7 +245,7 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
   /// \param NumThreads The value corresponding to the num_threads clause, if
   /// any, or nullptr.
   /// \param NumThreadsModifier The modifier of the num_threads clause, if
-  /// any, ignored otherwise. Currently unused on the device.
+  /// any, ignored otherwise.
   /// \param Severity The severity corresponding to the num_threads clause, if
   /// any, ignored otherwise. Currently unused on the device.
   /// \param Message The message string corresponding to the num_threads clause,
diff --git a/clang/test/AST/ByteCode/openmp.cpp b/clang/test/AST/ByteCode/openmp.cpp
index c7cccfdd1de9a..61a33c8aa8335 100644
--- a/clang/test/AST/ByteCode/openmp.cpp
+++ b/clang/test/AST/ByteCode/openmp.cpp
@@ -17,13 +17,12 @@ extern int omp_get_thread_num(void);
 
 int test2() {
   int x = 0;
-  int result[N] = {0};
+  int device_result[N] = {0};
 
-  #pragma omp parallel loop num_threads(strict: N) severity(warning) message("msg")
+  #pragma omp target parallel loop num_threads(strict: N)
   for (int i = 0; i < N; i++) {
     x = omp_get_thread_num();
-    result[i] = i + x;
+    device_result[i] = i + x;
   }
 }
 
-
diff --git a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp b/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
deleted file mode 100644
index 513754b0bbad9..0000000000000
--- a/clang/test/OpenMP/amdgcn_parallel_num_threads_strict_messages.cpp
+++ /dev/null
@@ -1,108 +0,0 @@
-// RUN: %clang_cc1 -DF1 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF1 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF2 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF2 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-// RUN: %clang_cc1 -DF3 -verify -fopenmp -fopenmp-version=60 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host-ppc.bc
-// RUN: %clang_cc1 -DF3 -DTARGET -verify -fopenmp -fopenmp-version=60 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host-ppc.bc -o /dev/null
-
-#ifndef TARGET
-// expected-no-diagnostics
-#endif
-
-#ifdef F3
-template<typename tx>
-tx ftemplate(int n) {
-  tx a = 0;
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp parallel num_threads(strict: tx(20)) severity(fatal) message("msg")
-  {
-  }
-
-  short b = 1;
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp parallel num_threads(strict: b) severity(warning) message("msg")
-  {
-    a += b;
-  }
-
-  return a;
-}
-#endif
-
-#ifdef F2
-static
-int fstatic(int n) {
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp target parallel num_threads(strict: n) message("msg")
-  {
-  }
-
-#ifdef TARGET
-  // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-  #pragma omp target parallel num_threads(strict: 32+n) severity(warning)
-  {
-  }
-
-  return n+1;
-}
-#endif
-
-#ifdef F1
-struct S1 {
-  double a;
-
-  int r1(int n){
-    int b = 1;
-
-#ifdef TARGET
-    // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-    #pragma omp parallel num_threads(strict: n-b) severity(warning) message("msg")
-    {
-      this->a = (double)b + 1.5;
-    }
-
-#ifdef TARGET
-    // expected-warning@+2 {{modifier 'strict' is currently not supported on a GPU for the 'num_threads' clause; modifier ignored}}
-#endif
-    #pragma omp parallel num_threads(strict: 1024) severity(fatal)
-    {
-      this->a = 2.5;
-    }
-
-    return (int)a;
-  }
-};
-#endif
-
-int bar(int n){
-  int a = 0;
-
-#ifdef F1
-  #pragma omp target
-  {
-    S1 S;
-    a += S.r1(n);
-  }
-#endif
-
-#ifdef F2
-  a += fstatic(n);
-#endif
-
-#ifdef F3
-  #pragma omp target
-  a += ftemplate<int>(n);
-#endif
-
-  return a;
-}
diff --git a/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
new file mode 100644
index 0000000000000..5070b56f101bc
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
@@ -0,0 +1,1067 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_1
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+template<typename tx>
+tx ftemplate(int n) {
+  tx a = 0;
+  short aa = 0;
+  tx b[10];
+
+  #pragma omp target parallel map(tofrom: aa) num_threads(1024)
+  {
+    aa += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom: aa) num_threads(strict: 1024)
+  {
+    aa += 1;
+  }
+  #endif
+
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(strict: n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #endif
+
+  return a;
+}
+
+int bar(int n){
+  int a = 0;
+
+  a += ftemplate<int>(n);
+
+  return a;
+}
+
+#endif
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr))
+// OMP45_1-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP3]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP2]], i32 1, i32 1024, i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 1)
+// OMP45_1-NEXT:    call void @__kmpc_target_deinit()
+// OMP45_1-NEXT:    ret void
+// OMP45_1:       worker.exit:
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR1:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+// OMP45_1-NEXT:    [[ADD:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT:    [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
+// OMP45_1-NEXT:    store i16 [[CONV1]], ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [3 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR__ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT:    [[TMP3:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP3]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// OMP45_1-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 4
+// OMP45_1-NEXT:    [[TMP6:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP6]], align 8
+// OMP45_1-NEXT:    [[TMP7:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 1
+// OMP45_1-NEXT:    store ptr [[TMP1]], ptr [[TMP7]], align 8
+// OMP45_1-NEXT:    [[TMP8:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 2
+// OMP45_1-NEXT:    store ptr [[TMP2]], ptr [[TMP8]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_51(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP4]], i32 1, i32 [[TMP5]], i32 -1, ptr @{{__omp_offloading_[0-9a-z...
[truncated]

@kevinsala kevinsala requested review from jhuber6 and ro-i October 17, 2025 16:07
}

[[clang::always_inline]] void
__kmpc_parallel_60(IdentTy *ident, int32_t id, int32_t if_expr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the original issue that OpenMPOpt didn't recognize this anymore? I remember some complaints about the indirection. Not sure we need the old _51 anymore since we assume no backwards compat in device code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these wrappers are a no-no in general. I only introduced them as a temporary solution for a short time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, seems that OpenMPOPT handles __kmpc_parallel_51 but not __kmpc_parallel_60. The only difference is the strict modifier. Do you think it's ok to still use __kmpc_parallel_51 and use the last parameter nt_strict with default value false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without further ado, that's why I had introduced the wrappers as a workaround, see #156104

@kevinsala kevinsala requested a review from shiltian October 17, 2025 16:14
@ro-i
Copy link
Contributor

ro-i commented Oct 17, 2025

Just crashing the program does not seem very user-friendly to me, tbh. The user / dev would then need to debug to check why the program crashed. They could just debug the number of threads without the strict thing if they have to debug anyway.

@shiltian
Copy link
Contributor

IMHO if we want to add this back, we will need to fix OpenMPOpt first or along side such that we'd not introduce these many wrappers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:bytecode Issues for the clang bytecode constexpr interpreter clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category flang:openmp openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants