Skip to content

Conversation

@clementval
Copy link
Contributor

@clementval clementval commented Nov 6, 2025

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

void CUDAIntrinsicLibrary::genThreadFence(

The nvvm membar intrsinsic are also used in CUDA C/C++ (

__DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); }
)

@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2025

@llvm/pr-subscribers-mlir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

void CUDAIntrinsicLibrary::genThreadFence(


Full diff: https://github.com/llvm/llvm-project/pull/166698.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+33)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (+15)
  • (modified) mlir/test/Target/LLVMIR/nvvmir.mlir (+13)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   let hasVerifier = 1;
 }
 
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+    : I32EnumAttr<
+          "MemLevelKind",
+          "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+  let genSpecializedAttr = 0;
+  let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+  let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+                    Arguments<(ins MemLevelKindAttr:$level)> {
+  let summary = "Memory barrier operation";
+  let description = [{
+    `member` operation guarantees that prior memory accesses requested by this
+    thread are performed at the specified `level`, before later memory
+    operations requested by this thread following the membar instruction.
+
+    [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+  }];
+
+  let assemblyFormat = "$level attr-dict";
+  let llvmBuilder = [{
+    createIntrinsicCall(builder, getMembarLevelID($level), {});
+  }];
+}
+
 def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
       Arguments<(ins MemScopeKindAttr:$scope,
                      DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
   llvm_unreachable("Unsupported proxy kinds");
 }
 
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+  switch (level) {
+  case NVVM::MemLevelKind::CTA: {
+    return llvm::Intrinsic::nvvm_membar_cta;
+  }
+  case NVVM::MemLevelKind::GL: {
+    return llvm::Intrinsic::nvvm_membar_gl;
+  }
+  case NVVM::MemLevelKind::SYS: {
+    return llvm::Intrinsic::nvvm_membar_sys;
+  }
+  }
+  llvm_unreachable("Unknown level for memory barrier");
+}
+
 #define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
 
 static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
   nvvm.nanosleep 4000
   llvm.return
 }
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+  // CHECK: call void @llvm.nvvm.membar.cta()
+  nvvm.membar #nvvm.mem_level<cta>
+  // CHECK: call void @llvm.nvvm.membar.gl()
+  nvvm.membar #nvvm.mem_level<gl>
+  // CHECK: call void @llvm.nvvm.membar.sys()
+  nvvm.membar #nvvm.mem_level<sys>
+  llvm.return
+}

@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2025

@llvm/pr-subscribers-mlir-llvm

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

void CUDAIntrinsicLibrary::genThreadFence(


Full diff: https://github.com/llvm/llvm-project/pull/166698.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+33)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (+15)
  • (modified) mlir/test/Target/LLVMIR/nvvmir.mlir (+13)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   let hasVerifier = 1;
 }
 
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+    : I32EnumAttr<
+          "MemLevelKind",
+          "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+  let genSpecializedAttr = 0;
+  let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+  let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+                    Arguments<(ins MemLevelKindAttr:$level)> {
+  let summary = "Memory barrier operation";
+  let description = [{
+    `member` operation guarantees that prior memory accesses requested by this
+    thread are performed at the specified `level`, before later memory
+    operations requested by this thread following the membar instruction.
+
+    [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+  }];
+
+  let assemblyFormat = "$level attr-dict";
+  let llvmBuilder = [{
+    createIntrinsicCall(builder, getMembarLevelID($level), {});
+  }];
+}
+
 def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
       Arguments<(ins MemScopeKindAttr:$scope,
                      DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
   llvm_unreachable("Unsupported proxy kinds");
 }
 
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+  switch (level) {
+  case NVVM::MemLevelKind::CTA: {
+    return llvm::Intrinsic::nvvm_membar_cta;
+  }
+  case NVVM::MemLevelKind::GL: {
+    return llvm::Intrinsic::nvvm_membar_gl;
+  }
+  case NVVM::MemLevelKind::SYS: {
+    return llvm::Intrinsic::nvvm_membar_sys;
+  }
+  }
+  llvm_unreachable("Unknown level for memory barrier");
+}
+
 #define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
 
 static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
   nvvm.nanosleep 4000
   llvm.return
 }
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+  // CHECK: call void @llvm.nvvm.membar.cta()
+  nvvm.membar #nvvm.mem_level<cta>
+  // CHECK: call void @llvm.nvvm.membar.gl()
+  nvvm.membar #nvvm.mem_level<gl>
+  // CHECK: call void @llvm.nvvm.membar.sys()
+  nvvm.membar #nvvm.mem_level<sys>
+  llvm.return
+}

@grypp grypp requested a review from durga4github November 6, 2025 15:34
Copy link
Contributor

@durga4github durga4github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a few nits.

@clementval clementval merged commit b4d7d3f into llvm:main Nov 7, 2025
8 of 9 checks passed
@clementval clementval deleted the nvvm_membar branch November 7, 2025 18:39
clementval added a commit that referenced this pull request Nov 7, 2025
Use the operation introduced in #166698. Also split the test into a new
file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to
big. I'm planning to reorganize this file to have better separation of
the tests
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
Use the operation introduced in llvm#166698. Also split the test into a new
file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to
big. I'm planning to reorganize this file to have better separation of
the tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants