-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][NVVM] Add nvvm.membar operation #166698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir Author: Valentin Clement (バレンタイン クレメン) (clementval) ChangesAdd nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar This will be used to replace direct intrinsic call in CUDA Fortran for
Full diff: https://github.com/llvm/llvm-project/pull/166698.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
let hasVerifier = 1;
}
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+ : I32EnumAttr<
+ "MemLevelKind",
+ "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+ let genSpecializedAttr = 0;
+ let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+ Arguments<(ins MemLevelKindAttr:$level)> {
+ let summary = "Memory barrier operation";
+ let description = [{
+ `member` operation guarantees that prior memory accesses requested by this
+ thread are performed at the specified `level`, before later memory
+ operations requested by this thread following the membar instruction.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+ }];
+
+ let assemblyFormat = "$level attr-dict";
+ let llvmBuilder = [{
+ createIntrinsicCall(builder, getMembarLevelID($level), {});
+ }];
+}
+
def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
Arguments<(ins MemScopeKindAttr:$scope,
DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
llvm_unreachable("Unsupported proxy kinds");
}
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+ switch (level) {
+ case NVVM::MemLevelKind::CTA: {
+ return llvm::Intrinsic::nvvm_membar_cta;
+ }
+ case NVVM::MemLevelKind::GL: {
+ return llvm::Intrinsic::nvvm_membar_gl;
+ }
+ case NVVM::MemLevelKind::SYS: {
+ return llvm::Intrinsic::nvvm_membar_sys;
+ }
+ }
+ llvm_unreachable("Unknown level for memory barrier");
+}
+
#define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
nvvm.nanosleep 4000
llvm.return
}
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+ // CHECK: call void @llvm.nvvm.membar.cta()
+ nvvm.membar #nvvm.mem_level<cta>
+ // CHECK: call void @llvm.nvvm.membar.gl()
+ nvvm.membar #nvvm.mem_level<gl>
+ // CHECK: call void @llvm.nvvm.membar.sys()
+ nvvm.membar #nvvm.mem_level<sys>
+ llvm.return
+}
|
|
@llvm/pr-subscribers-mlir-llvm Author: Valentin Clement (バレンタイン クレメン) (clementval) ChangesAdd nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar This will be used to replace direct intrinsic call in CUDA Fortran for
Full diff: https://github.com/llvm/llvm-project/pull/166698.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 80bc0e5986e51..f00aba15bfcae 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -1236,6 +1236,39 @@ def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">,
let hasVerifier = 1;
}
+// Attrs describing the level of the Memory Operation
+def MemLevelCTA : I32EnumAttrCase<"CTA", 0, "cta">;
+def MemLevelGL : I32EnumAttrCase<"GL", 1, "gl">;
+def MemLevelSys : I32EnumAttrCase<"SYS", 2, "sys">;
+
+def MemLevelKind
+ : I32EnumAttr<
+ "MemLevelKind",
+ "NVVM Memory Level kind", [MemLevelCTA, MemLevelGL, MemLevelSys]> {
+ let genSpecializedAttr = 0;
+ let cppNamespace = "::mlir::NVVM";
+}
+def MemLevelKindAttr : EnumAttr<NVVM_Dialect, MemLevelKind, "mem_level"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_MembarOp : NVVM_Op<"membar">,
+ Arguments<(ins MemLevelKindAttr:$level)> {
+ let summary = "Memory barrier operation";
+ let description = [{
+ `member` operation guarantees that prior memory accesses requested by this
+ thread are performed at the specified `level`, before later memory
+ operations requested by this thread following the membar instruction.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar)
+ }];
+
+ let assemblyFormat = "$level attr-dict";
+ let llvmBuilder = [{
+ createIntrinsicCall(builder, getMembarLevelID($level), {});
+ }];
+}
+
def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">,
Arguments<(ins MemScopeKindAttr:$scope,
DefaultValuedAttr<ProxyKindAttr,
diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
index 0964e1b8c5ef3..9d6ccd90b2060 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
@@ -291,6 +291,21 @@ static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy,
llvm_unreachable("Unsupported proxy kinds");
}
+static unsigned getMembarLevelID(NVVM::MemLevelKind level) {
+ switch (level) {
+ case NVVM::MemLevelKind::CTA: {
+ return llvm::Intrinsic::nvvm_membar_cta;
+ }
+ case NVVM::MemLevelKind::GL: {
+ return llvm::Intrinsic::nvvm_membar_gl;
+ }
+ case NVVM::MemLevelKind::SYS: {
+ return llvm::Intrinsic::nvvm_membar_sys;
+ }
+ }
+ llvm_unreachable("Unknown level for memory barrier");
+}
+
#define TCGEN05LD(SHAPE, NUM) llvm::Intrinsic::nvvm_tcgen05_ld_##SHAPE##_##NUM
static llvm::Intrinsic::ID
diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir
index 1ec55408e97a5..04b2d791188c1 100644
--- a/mlir/test/Target/LLVMIR/nvvmir.mlir
+++ b/mlir/test/Target/LLVMIR/nvvmir.mlir
@@ -975,3 +975,16 @@ llvm.func @nanosleep() {
nvvm.nanosleep 4000
llvm.return
}
+
+// -----
+
+// CHECK-lABEL: @memorybarrier()
+llvm.func @memorybarrier() {
+ // CHECK: call void @llvm.nvvm.membar.cta()
+ nvvm.membar #nvvm.mem_level<cta>
+ // CHECK: call void @llvm.nvvm.membar.gl()
+ nvvm.membar #nvvm.mem_level<gl>
+ // CHECK: call void @llvm.nvvm.membar.sys()
+ nvvm.membar #nvvm.mem_level<sys>
+ llvm.return
+}
|
mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
Outdated
Show resolved
Hide resolved
durga4github
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a few nits.
eb79274 to
8aaf2d2
Compare
Use the operation introduced in #166698. Also split the test into a new file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to big. I'm planning to reorganize this file to have better separation of the tests
Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar This will be used to replace direct intrinsic call in CUDA Fortran for `threadfence()`, `threadfence_block` and `thread fence_system()` currently lowered here: https://github.com/llvm/llvm-project/blob/e700f157026bf8b4d58f936c5db8f152e269d77f/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp#L1310 The nvvm membar intrsinsic are also used in CUDA C/C++ (https://github.com/llvm/llvm-project/blob/49f55f4991227f3c7a2b8161bbf45c74b7023944/clang/lib/Headers/__clang_cuda_device_functions.h#L528)
Use the operation introduced in llvm#166698. Also split the test into a new file since `flang/test/Lower/CUDA/cuda-device-proc.cuf` is getting to big. I'm planning to reorganize this file to have better separation of the tests
Add nvvm.membar operation with level as defined in https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-membar
This will be used to replace direct intrinsic call in CUDA Fortran for
threadfence(),threadfence_blockandthread fence_system()currently lowered here:llvm-project/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
Line 1310 in e700f15
The nvvm membar intrsinsic are also used in CUDA C/C++ (
llvm-project/clang/lib/Headers/__clang_cuda_device_functions.h
Line 528 in 49f55f4