Skip to content

Conversation

@kevinsala
Copy link
Contributor

Recover strict modifier for num_threads clause. Support for message and severity clauses will come later. For now, the strict errors are fatal and don't provide runtime information. But still, it's better than nothing for debugging purposes.

@llvmbot llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang openmp:libomp OpenMP host runtime clang:bytecode Issues for the clang bytecode constexpr interpreter labels Dec 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 8, 2025

@llvm/pr-subscribers-clang-codegen

Author: Kevin Sala Penades (kevinsala)

Changes

Recover strict modifier for num_threads clause. Support for message and severity clauses will come later. For now, the strict errors are fatal and don't provide runtime information. But still, it's better than nothing for debugging purposes.


Patch is 1.73 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/171201.diff

14 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+5-12)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.h (+1-1)
  • (modified) clang/test/AST/ByteCode/openmp.cpp (+3-4)
  • (added) clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp (+1067)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+5803-5676)
  • (added) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+3417)
  • (modified) clang/test/OpenMP/nvptx_parallel_num_threads_strict_messages.cpp (+4-23)
  • (modified) clang/test/OpenMP/nvptx_target_parallel_num_threads_codegen.cpp (+703-35)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen.cpp (+383-18)
  • (modified) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+1888-1262)
  • (modified) clang/test/OpenMP/target_parallel_num_threads_strict_messages.cpp (+6-27)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_strict_codegen.cpp (+1342)
  • (added) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_strict_codegen.cpp (+1789)
  • (modified) openmp/device/src/Parallelism.cpp (+23-11)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index c2b87924457a1..a03864d2b81a8 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -924,13 +924,6 @@ void CGOpenMPRuntimeGPU::emitNumThreadsClause(
     OpenMPNumThreadsClauseModifier Modifier, OpenMPSeverityClauseKind Severity,
     SourceLocation SeverityLoc, const Expr *Message,
     SourceLocation MessageLoc) {
-  if (Modifier == OMPC_NUMTHREADS_strict) {
-    CGM.getDiags().Report(Loc,
-                          diag::warn_omp_gpu_unsupported_modifier_for_clause)
-        << "strict" << getOpenMPClauseName(OMPC_num_threads);
-    return;
-  }
-
   // Nothing to do.
 }
 
@@ -1238,9 +1231,9 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
   if (!CGF.HaveInsertPoint())
     return;
 
-  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond,
-                        NumThreads](CodeGenFunction &CGF,
-                                    PrePostActionTy &Action) {
+  auto &&ParallelGen = [this, Loc, OutlinedFn, CapturedVars, IfCond, NumThreads,
+                        NumThreadsModifier](CodeGenFunction &CGF,
+                                            PrePostActionTy &Action) {
     CGBuilderTy &Bld = CGF.Builder;
     llvm::Value *NumThreadsVal = NumThreads;
     llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
@@ -1291,8 +1284,8 @@ void CGOpenMPRuntimeGPU::emitParallelCall(
     else
       NumThreadsVal = Bld.CreateZExtOrTrunc(NumThreadsVal, CGF.Int32Ty);
 
-    // No strict prescriptiveness for the number of threads.
-    llvm::Value *StrictNumThreadsVal = llvm::ConstantInt::get(CGF.Int32Ty, 0);
+    // Forward whether the strict modifier is specified.
+    llvm::Value *StrictNumThreadsVal = llvm::ConstantInt::get(CGM.Int32Ty, NumThreadsModifier == OMPC_NUMTHREADS_strict);
 
     assert(IfCondVal && "Expected a value");
     llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
index 3a7ee5456a9d2..719ddf43b5cef 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
@@ -245,7 +245,7 @@ class CGOpenMPRuntimeGPU : public CGOpenMPRuntime {
   /// \param NumThreads The value corresponding to the num_threads clause, if
   /// any, or nullptr.
   /// \param NumThreadsModifier The modifier of the num_threads clause, if
-  /// any, ignored otherwise. Currently unused on the device.
+  /// any, ignored otherwise.
   /// \param Severity The severity corresponding to the num_threads clause, if
   /// any, ignored otherwise. Currently unused on the device.
   /// \param Message The message string corresponding to the num_threads clause,
diff --git a/clang/test/AST/ByteCode/openmp.cpp b/clang/test/AST/ByteCode/openmp.cpp
index c7cccfdd1de9a..61a33c8aa8335 100644
--- a/clang/test/AST/ByteCode/openmp.cpp
+++ b/clang/test/AST/ByteCode/openmp.cpp
@@ -17,13 +17,12 @@ extern int omp_get_thread_num(void);
 
 int test2() {
   int x = 0;
-  int result[N] = {0};
+  int device_result[N] = {0};
 
-  #pragma omp parallel loop num_threads(strict: N) severity(warning) message("msg")
+  #pragma omp target parallel loop num_threads(strict: N)
   for (int i = 0; i < N; i++) {
     x = omp_get_thread_num();
-    result[i] = i + x;
+    device_result[i] = i + x;
   }
 }
 
-
diff --git a/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
new file mode 100644
index 0000000000000..a57f72486c31f
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_target_parallel_num_threads_codegen.cpp
@@ -0,0 +1,1067 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_1
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=OMP45_2
+
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_1
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+// RUN: %clang_cc1 -DOMP60 -verify -fopenmp -fopenmp-version=60 -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefixes=OMP60_2
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns | FileCheck %s --check-prefix=CHECK2
+
+// expected-no-diagnostics
+#ifndef HEADER
+#define HEADER
+
+template<typename tx>
+tx ftemplate(int n) {
+  tx a = 0;
+  short aa = 0;
+  tx b[10];
+
+  #pragma omp target parallel map(tofrom: aa) num_threads(1024)
+  {
+    aa += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom: aa) num_threads(strict: 1024)
+  {
+    aa += 1;
+  }
+  #endif
+
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #ifdef OMP60
+  #pragma omp target parallel map(tofrom:a, aa, b) if(target: n>40) num_threads(strict: n)
+  {
+    a += 1;
+    aa += 1;
+    b[2] += 1;
+  }
+  #endif
+
+  return a;
+}
+
+int bar(int n){
+  int a = 0;
+
+  a += ftemplate<int>(n);
+
+  return a;
+}
+
+#endif
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP2:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1:[0-9]+]] to ptr))
+// OMP45_1-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP3]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_60(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP2]], i32 1, i32 1024, i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 1, i32 0)
+// OMP45_1-NEXT:    call void @__kmpc_target_deinit()
+// OMP45_1-NEXT:    ret void
+// OMP45_1:       worker.exit:
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR1:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+// OMP45_1-NEXT:    [[ADD:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT:    [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
+// OMP45_1-NEXT:    store i16 [[CONV1]], ptr [[TMP0]], align 2
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42
+// OMP45_1-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]], i64 noundef [[DOTCAPTURE_EXPR_:%.*]]) #[[ATTR4:[0-9]+]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR:%.*]] = alloca i64, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [3 x ptr], align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_1-NEXT:    [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTCAPTURE_EXPR__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTCAPTURE_EXPR__ADDR]] to ptr
+// OMP45_1-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_1-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store i64 [[DOTCAPTURE_EXPR_]], ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9:![0-9]+]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT:    [[TMP3:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_1-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP3]], -1
+// OMP45_1-NEXT:    br i1 [[EXEC_USER_CODE]], label [[USER_CODE_ENTRY:%.*]], label [[WORKER_EXIT:%.*]]
+// OMP45_1:       user_code.entry:
+// OMP45_1-NEXT:    [[TMP4:%.*]] = call i32 @__kmpc_global_thread_num(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr))
+// OMP45_1-NEXT:    [[TMP5:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR__ADDR_ASCAST]], align 4
+// OMP45_1-NEXT:    [[TMP6:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 0
+// OMP45_1-NEXT:    store ptr [[TMP0]], ptr [[TMP6]], align 8
+// OMP45_1-NEXT:    [[TMP7:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 1
+// OMP45_1-NEXT:    store ptr [[TMP1]], ptr [[TMP7]], align 8
+// OMP45_1-NEXT:    [[TMP8:%.*]] = getelementptr inbounds [3 x ptr], ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 0, i64 2
+// OMP45_1-NEXT:    store ptr [[TMP2]], ptr [[TMP8]], align 8
+// OMP45_1-NEXT:    call void @__kmpc_parallel_60(ptr addrspacecast (ptr addrspace(1) @[[GLOB1]] to ptr), i32 [[TMP4]], i32 1, i32 [[TMP5]], i32 -1, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_omp_outlined, ptr null, ptr [[CAPTURED_VARS_ADDRS_ASCAST]], i64 3, i32 0)
+// OMP45_1-NEXT:    call void @__kmpc_target_deinit()
+// OMP45_1-NEXT:    ret void
+// OMP45_1:       worker.exit:
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_1-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l42_omp_outlined
+// OMP45_1-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]], ptr noundef nonnull align 4 dereferenceable(40) [[B:%.*]]) #[[ATTR1]] {
+// OMP45_1-NEXT:  entry:
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[B_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_1-NEXT:    [[DOTGLOBAL_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTGLOBAL_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[DOTBOUND_TID__ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTBOUND_TID__ADDR]] to ptr
+// OMP45_1-NEXT:    [[A_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_ADDR]] to ptr
+// OMP45_1-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_1-NEXT:    [[B_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B_ADDR]] to ptr
+// OMP45_1-NEXT:    store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[A]], ptr [[A_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    store ptr [[B]], ptr [[B_ADDR_ASCAST]], align 8
+// OMP45_1-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META8]]
+// OMP45_1-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[B_ADDR_ASCAST]], align 8, !nonnull [[META7]], !align [[META9]]
+// OMP45_1-NEXT:    [[TMP3:%.*]] = load i32, ptr [[TMP0]], align 4
+// OMP45_1-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP3]], 1
+// OMP45_1-NEXT:    store i32 [[ADD]], ptr [[TMP0]], align 4
+// OMP45_1-NEXT:    [[TMP4:%.*]] = load i16, ptr [[TMP1]], align 2
+// OMP45_1-NEXT:    [[CONV:%.*]] = sext i16 [[TMP4]] to i32
+// OMP45_1-NEXT:    [[ADD1:%.*]] = add nsw i32 [[CONV]], 1
+// OMP45_1-NEXT:    [[CONV2:%.*]] = trunc i32 [[ADD1]] to i16
+// OMP45_1-NEXT:    store i16 [[CONV2]], ptr [[TMP1]], align 2
+// OMP45_1-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [10 x i32], ptr [[TMP2]], i64 0, i64 2
+// OMP45_1-NEXT:    [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+// OMP45_1-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP5]], 1
+// OMP45_1-NEXT:    store i32 [[ADD3]], ptr [[ARRAYIDX]], align 4
+// OMP45_1-NEXT:    ret void
+//
+//
+// OMP45_2-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31
+// OMP45_2-SAME: (ptr noalias noundef [[DYN_PTR:%.*]], ptr noundef nonnull align 2 dereferenceable(2) [[AA:%.*]]) #[[ATTR0:[0-9]+]] {
+// OMP45_2-NEXT:  entry:
+// OMP45_2-NEXT:    [[DYN_PTR_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_2-NEXT:    [[AA_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OMP45_2-NEXT:    [[CAPTURED_VARS_ADDRS:%.*]] = alloca [1 x ptr], align 8, addrspace(5)
+// OMP45_2-NEXT:    [[DYN_PTR_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DYN_PTR_ADDR]] to ptr
+// OMP45_2-NEXT:    [[AA_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AA_ADDR]] to ptr
+// OMP45_2-NEXT:    [[CAPTURED_VARS_ADDRS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[CAPTURED_VARS_ADDRS]] to ptr
+// OMP45_2-NEXT:    store ptr [[DYN_PTR]], ptr [[DYN_PTR_ADDR_ASCAST]], align 8
+// OMP45_2-NEXT:    store ptr [[AA]], ptr [[AA_ADDR_ASCAST]], align 8
+// OMP45_2-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[AA_ADDR_ASCAST]], align 8, !nonnull [[META7:![0-9]+]], !align [[META8:![0-9]+]]
+// OMP45_2-NEXT:    [[TMP1:%.*]] = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z9ftemplateIiET_i_l31_kernel_environment to ptr), ptr [[DYN_PTR]])
+// OMP45_2-NEXT:    [[EXEC_USER_CODE:%.*]] = icmp eq i32 [[TMP1]], -1
+// OMP45_2-NEXT:    br i1 [[E...
[truncated]

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kevinsala kevinsala changed the title [Clang][OpenMP] Recover strict modifier for num_threads [Clang][OpenMP] Recover strict modifier for num_threads in target regions Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:bytecode Issues for the clang bytecode constexpr interpreter clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang openmp:libomp OpenMP host runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants