Skip to content

Conversation

@clementval
Copy link
Contributor

The simple form of Barrier0Op is available in the NVVM dialect. It is needed to use it instead of the string version since #140615

@clementval clementval requested review from Renaud-K and wangzpgi May 21, 2025 18:48
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels May 21, 2025
@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

The simple form of Barrier0Op is available in the NVVM dialect. It is needed to use it instead of the string version since #140615


Full diff: https://github.com/llvm/llvm-project/pull/140947.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+1-6)
  • (modified) flang/test/Lower/CUDA/cuda-device-proc.cuf (+2-2)
diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index 1ac0627da9524..178b6770d6b53 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -8332,12 +8332,7 @@ IntrinsicLibrary::genSum(mlir::Type resultType,
 
 // SYNCTHREADS
 void IntrinsicLibrary::genSyncThreads(llvm::ArrayRef<fir::ExtendedValue> args) {
-  constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0";
-  mlir::FunctionType funcType =
-      mlir::FunctionType::get(builder.getContext(), {}, {});
-  auto funcOp = builder.createFunction(loc, funcName, funcType);
-  llvm::SmallVector<mlir::Value> noArgs;
-  builder.create<fir::CallOp>(loc, funcOp, noArgs);
+  builder.create<mlir::NVVM::Barrier0Op>(loc);
 }
 
 // SYNCTHREADS_AND
diff --git a/flang/test/Lower/CUDA/cuda-device-proc.cuf b/flang/test/Lower/CUDA/cuda-device-proc.cuf
index 8f5e6dd36da4e..42ee7657966e2 100644
--- a/flang/test/Lower/CUDA/cuda-device-proc.cuf
+++ b/flang/test/Lower/CUDA/cuda-device-proc.cuf
@@ -49,7 +49,7 @@ attributes(global) subroutine devsub()
 end
 
 ! CHECK-LABEL: func.func @_QPdevsub() attributes {cuf.proc_attr = #cuf.cuda_proc<global>}
-! CHECK: fir.call @llvm.nvvm.barrier0() fastmath<contract> : () -> ()
+! CHECK: nvvm.barrier0
 ! CHECK: fir.call @llvm.nvvm.bar.warp.sync(%c1{{.*}}) fastmath<contract> : (i32) -> ()
 ! CHECK: fir.call @llvm.nvvm.membar.gl() fastmath<contract> : () -> ()
 ! CHECK: fir.call @llvm.nvvm.membar.cta() fastmath<contract> : () -> ()
@@ -106,7 +106,7 @@ end
 
 ! CHECK-LABEL: func.func @_QPhost1()
 ! CHECK: cuf.kernel
-! CHECK: fir.call @llvm.nvvm.barrier0() fastmath<contract> : () -> ()
+! CHECK: nvvm.barrier0
 ! CHECK: fir.call @llvm.nvvm.bar.warp.sync(%c1{{.*}}) fastmath<contract> : (i32) -> ()
 ! CHECK: fir.call @llvm.nvvm.barrier0.and(%c1{{.*}}) fastmath<contract> : (i32) -> i32
 ! CHECK: fir.call @llvm.nvvm.barrier0.popc(%c1{{.*}}) fastmath<contract> : (i32) -> i32

Copy link
Contributor

@Renaud-K Renaud-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should mention that in non strictly CUF flows (OpenAcc), I am considering using gpu.barrier as it can target different back-ends.

@clementval
Copy link
Contributor Author

I should mention that in non strictly CUF flows (OpenAcc), I am considering using gpu.barrier as it can target different back-ends.

syncthreads is a cudadevice procedure so I don't think it needs to be portable.

@clementval clementval merged commit 89d9a83 into llvm:main May 21, 2025
12 of 13 checks passed
@clementval clementval deleted the cuf_nvvm_barrier0 branch May 21, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants