[mlir][NVVM] Add nvvm.membar operation #166698

clementval · 2025-11-06T05:12:47Z

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

llvm-project/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp

Line 1310 in e700f15

void CUDAIntrinsicLibrary::genThreadFence(

The nvvm membar intrsinsic are also used in CUDA C/C++ (

llvm-project/clang/lib/Headers/__clang_cuda_device_functions.h

Line 528 in 49f55f4

__DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); }

)

llvmbot · 2025-11-06T05:13:19Z

@llvm/pr-subscribers-mlir

Author: Valentin Clement (バレンタインクレメン) (clementval)

Changes

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

llvm-project/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp

Line 1310 in e700f15

void CUDAIntrinsicLibrary::genThreadFence(

Full diff: https://github.com/llvm/llvm-project/pull/166698.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+33)
(modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (+15)
(modified) mlir/test/Target/LLVMIR/nvvmir.mlir (+13)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   let hasVerifier = 1;
 }
 
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+    : I32EnumAttr<
+          "MemLevelKind",
+          "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+  let genSpecializedAttr = 0;
+  let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+  let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+                    Arguments<(ins MemLevelKindAttr:$level)> {
+  let summary = "Memory barrier operation";
+  let description = [{
+    `member` operation guarantees that prior memory accesses requested by this
+    thread are performed at the specified `level`, before later memory
+    operations requested by this thread following the membar instruction.
+
+    [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+  }];
+
+  let assemblyFormat = "$level attr-dict";
+  let llvmBuilder = [{
+    createIntrinsicCall(builder, getMembarLevelID($level), {});
+  }];
+}
+
 def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
       Arguments<(ins MemScopeKindAttr:$scope,
                      DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
   llvm_unreachable("Unsupported proxy kinds");
 }
 
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+  switch (level) {
+  case NVVM::MemLevelKind::CTA: {
+    return llvm::Intrinsic::nvvm_membar_cta;
+  }
+  case NVVM::MemLevelKind::GL: {
+    return llvm::Intrinsic::nvvm_membar_gl;
+  }
+  case NVVM::MemLevelKind::SYS: {
+    return llvm::Intrinsic::nvvm_membar_sys;
+  }
+  }
+  llvm_unreachable("Unknown level for memory barrier");
+}
+
 #define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
 
 static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
   nvvm.nanosleep 4000
   llvm.return
 }
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+  // CHECK: call void @llvm.nvvm.membar.cta()
+  nvvm.membar #nvvm.mem_level<cta>
+  // CHECK: call void @llvm.nvvm.membar.gl()
+  nvvm.membar #nvvm.mem_level<gl>
+  // CHECK: call void @llvm.nvvm.membar.sys()
+  nvvm.membar #nvvm.mem_level<sys>
+  llvm.return
+}

llvmbot · 2025-11-06T05:13:20Z

@llvm/pr-subscribers-mlir-llvm

Author: Valentin Clement (バレンタインクレメン) (clementval)

Changes

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar

This will be used to replace direct intrinsic call in CUDA Fortran for threadfence(), threadfence_block and thread fence_system() currently lowered here:

llvm-project/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp

Line 1310 in e700f15

void CUDAIntrinsicLibrary::genThreadFence(

Full diff: https://github.com/llvm/llvm-project/pull/166698.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+33)
(modified) mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp (+15)
(modified) mlir/test/Target/LLVMIR/nvvmir.mlir (+13)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
   let hasVerifier = 1;
 }
 
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+    : I32EnumAttr<
+          "MemLevelKind",
+          "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+  let genSpecializedAttr = 0;
+  let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+  let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+                    Arguments<(ins MemLevelKindAttr:$level)> {
+  let summary = "Memory barrier operation";
+  let description = [{
+    `member` operation guarantees that prior memory accesses requested by this
+    thread are performed at the specified `level`, before later memory
+    operations requested by this thread following the membar instruction.
+
+    [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+  }];
+
+  let assemblyFormat = "$level attr-dict";
+  let llvmBuilder = [{
+    createIntrinsicCall(builder, getMembarLevelID($level), {});
+  }];
+}
+
 def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
       Arguments<(ins MemScopeKindAttr:$scope,
                      DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
   llvm_unreachable("Unsupported proxy kinds");
 }
 
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+  switch (level) {
+  case NVVM::MemLevelKind::CTA: {
+    return llvm::Intrinsic::nvvm_membar_cta;
+  }
+  case NVVM::MemLevelKind::GL: {
+    return llvm::Intrinsic::nvvm_membar_gl;
+  }
+  case NVVM::MemLevelKind::SYS: {
+    return llvm::Intrinsic::nvvm_membar_sys;
+  }
+  }
+  llvm_unreachable("Unknown level for memory barrier");
+}
+
 #define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
 
 static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
   nvvm.nanosleep 4000
   llvm.return
 }
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+  // CHECK: call void @llvm.nvvm.membar.cta()
+  nvvm.membar #nvvm.mem_level<cta>
+  // CHECK: call void @llvm.nvvm.membar.gl()
+  nvvm.membar #nvvm.mem_level<gl>
+  // CHECK: call void @llvm.nvvm.membar.sys()
+  nvvm.membar #nvvm.mem_level<sys>
+  llvm.return
+}

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

mlir/test/Target/LLVMIR/nvvmir.mlir

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp

durga4github

LGTM except for a few nits.

Use the operation introduced in #166698. Also split the test into a new file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to big. I'm planning to reorganize this file to have better separation of the tests

Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar This will be used to replace direct intrinsic call in CUDA Fortran for `threadfence()`, `threadfence_block` and `thread fence_system()` currently lowered here: https://github.com/llvm/llvm-project/blob/e700f157026bf8b4d58f936c5db8f152e269d77f/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp#L1310 The nvvm membar intrsinsic are also used in CUDA C/C++ (https://github.com/llvm/llvm-project/blob/49f55f4991227f3c7a2b8161bbf45c74b7023944/clang/lib/Headers/__clang_cuda_device_functions.h#L528)

Use the operation introduced in llvm#166698. Also split the test into a new file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to big. I'm planning to reorganize this file to have better separation of the tests

clementval requested a review from grypp as a code owner November 6, 2025 05:12

llvmbot added mlir:llvm mlir labels Nov 6, 2025

grypp reviewed Nov 6, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

grypp reviewed Nov 6, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

grypp requested a review from durga4github November 6, 2025 15:34

durga4github reviewed Nov 6, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

durga4github reviewed Nov 6, 2025

View reviewed changes

mlir/test/Target/LLVMIR/nvvmir.mlir Show resolved Hide resolved

durga4github reviewed Nov 6, 2025

View reviewed changes

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp Show resolved Hide resolved

durga4github reviewed Nov 6, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

clementval added 4 commits November 6, 2025 11:57

[mlir][NVVM] Add nvvm.membar operation

02ae9ea

Fix typo

0308d0d

Address review comments

4f3be47

Rename getMembarLevelID to getMembarIntrinsicID

00a45d2

clementval commented Nov 6, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

durga4github reviewed Nov 7, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td Outdated Show resolved Hide resolved

durga4github reviewed Nov 7, 2025

View reviewed changes

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp Outdated Show resolved Hide resolved

durga4github reviewed Nov 7, 2025

View reviewed changes

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp Show resolved Hide resolved

durga4github reviewed Nov 7, 2025

View reviewed changes

mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp Outdated Show resolved Hide resolved

durga4github approved these changes Nov 7, 2025

View reviewed changes

clementval added 2 commits November 7, 2025 10:20

Address nits comments

f3d62f8

fix typo

8aaf2d2

clementval force-pushed the nvvm_membar branch from eb79274 to 8aaf2d2 Compare November 7, 2025 18:21

clementval merged commit b4d7d3f into llvm:main Nov 7, 2025
8 of 9 checks passed

clementval deleted the nvvm_membar branch November 7, 2025 18:39

clementval mentioned this pull request Nov 7, 2025

[flang][cuda][NFC] Use NVVM operation for thread syncs #166999

Merged

nigham mentioned this pull request Nov 10, 2025

[libc] Implement fchown #167286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][NVVM] Add nvvm.membar operation #166698

[mlir][NVVM] Add nvvm.membar operation #166698

Uh oh!

clementval commented Nov 6, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[mlir][NVVM] Add nvvm.membar operation #166698

[mlir][NVVM] Add nvvm.membar operation #166698

Uh oh!

Conversation

clementval commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clementval commented Nov 6, 2025 •

edited

Loading