Skip to content

Conversation

@clementval
Copy link
Contributor

No description provided.

@clementval clementval requested a review from wangzpgi November 24, 2025 22:28
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/169417.diff

3 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h (+1)
  • (modified) flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp (+37)
  • (modified) flang/module/cooperative_groups.f90 (+7)
diff --git a/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h b/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
index 027bd3b79a1df..cedc7a9437eb5 100644
--- a/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
+++ b/flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
@@ -47,6 +47,7 @@ struct CUDAIntrinsicLibrary : IntrinsicLibrary {
   void genBarrierInit(llvm::ArrayRef<fir::ExtendedValue>);
   mlir::Value genBarrierTryWait(mlir::Type, llvm::ArrayRef<mlir::Value>);
   mlir::Value genBarrierTryWaitSleep(mlir::Type, llvm::ArrayRef<mlir::Value>);
+  mlir::Value genClusterDimBlocks(mlir::Type, llvm::ArrayRef<mlir::Value>);
   void genFenceProxyAsync(llvm::ArrayRef<fir::ExtendedValue>);
   template <const char *fctName, int extent>
   fir::ExtendedValue genLDXXFunc(mlir::Type,
diff --git a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
index c560c53033780..a770e2d9cdeff 100644
--- a/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
@@ -368,6 +368,11 @@ static constexpr IntrinsicHandler cudaHandlers[]{
          &CI::genNVVMTime<mlir::NVVM::Clock64Op>),
      {},
      /*isElemental=*/false},
+    {"cluster_dim_blocks",
+     static_cast<CUDAIntrinsicLibrary::ElementalGenerator>(
+         &CI::genClusterDimBlocks),
+     {},
+     /*isElemental=*/false},
     {"fence_proxy_async",
      static_cast<CUDAIntrinsicLibrary::SubroutineGenerator>(
          &CI::genFenceProxyAsync),
@@ -985,6 +990,38 @@ CUDAIntrinsicLibrary::genBarrierTryWaitSleep(mlir::Type resultType,
       .getResult(0);
 }
 
+// CLUSTER_DIM_BLOCKS
+mlir::Value
+CUDAIntrinsicLibrary::genClusterDimBlocks(mlir::Type resultType,
+                                          llvm::ArrayRef<mlir::Value> args) {
+  assert(args.size() == 0);
+  auto recTy = mlir::cast<fir::RecordType>(resultType);
+  assert(recTy && "RecordType expepected");
+  mlir::Value res = fir::AllocaOp::create(builder, loc, resultType);
+
+  auto insertDim = [&](mlir::Value dim, unsigned fieldPos) {
+    auto fieldName = recTy.getTypeList()[fieldPos].first;
+    mlir::Type fieldTy = recTy.getTypeList()[fieldPos].second;
+    mlir::Type fieldIndexType = fir::FieldType::get(resultType.getContext());
+    mlir::Value fieldIndex = fir::FieldIndexOp::create(
+        builder, loc, fieldIndexType, fieldName, recTy,
+        /*typeParams=*/mlir::ValueRange{});
+    mlir::Value coord = fir::CoordinateOp::create(
+        builder, loc, builder.getRefType(fieldTy), res, fieldIndex);
+    fir::StoreOp::create(builder, loc, dim, coord);
+  };
+
+  mlir::Type i32Ty = builder.getI32Type();
+  mlir::Value x = mlir::NVVM::ClusterDimBlocksXOp::create(builder, loc, i32Ty);
+  insertDim(x, 0);
+  mlir::Value y = mlir::NVVM::ClusterDimBlocksYOp::create(builder, loc, i32Ty);
+  insertDim(y, 1);
+  mlir::Value z = mlir::NVVM::ClusterDimBlocksZOp::create(builder, loc, i32Ty);
+  insertDim(z, 2);
+
+  return res;
+}
+
 // FENCE_PROXY_ASYNC
 void CUDAIntrinsicLibrary::genFenceProxyAsync(
     llvm::ArrayRef<fir::ExtendedValue> args) {
diff --git a/flang/module/cooperative_groups.f90 b/flang/module/cooperative_groups.f90
index 1c89866f9c84a..2631975837a5b 100644
--- a/flang/module/cooperative_groups.f90
+++ b/flang/module/cooperative_groups.f90
@@ -38,6 +38,13 @@ module cooperative_groups
   integer(4) :: rank
 end type thread_group
 
+interface
+  attributes(device) function cluster_dim_blocks()
+    import
+    type(dim3) :: cluster_dim_blocks
+  end function
+end interface
+
 interface
   attributes(device) function this_cluster()
     import

@wangzpgi
Copy link
Contributor

Is it possible to add a test?

@clementval
Copy link
Contributor Author

Is it possible to add a test?

Yes. I actually had a file with test for the previous PR as well but forgot to git add it so it was not sent. Just added it now.

@clementval clementval enabled auto-merge (squash) November 24, 2025 22:49
@clementval clementval merged commit ab2a302 into llvm:main Nov 24, 2025
10 checks passed
@clementval clementval deleted the cuf_cluster_dims branch November 24, 2025 22:56
aadeshps-mcw pushed a commit to aadeshps-mcw/llvm-project that referenced this pull request Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants