Skip to content

Conversation

@schwarzschild-radius
Copy link
Contributor

This commit removes Pure trait from clock, clock64 and globaltimer Ops by creating NVVM_NCSpecialRegisterOp class to represent Ops which return non-constant values. This prevents CSE pass from optimizing away redundant uses of them

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2025

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Pradeep Kumar (schwarzschild-radius)

Changes

This commit removes Pure trait from clock, clock64 and globaltimer Ops by creating NVVM_NCSpecialRegisterOp class to represent Ops which return non-constant values. This prevents CSE pass from optimizing away redundant uses of them


Full diff: https://github.com/llvm/llvm-project/pull/147608.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+10-3)
  • (added) mlir/test/Dialect/LLVMIR/cse-nvvm.mlir (+37)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 6895e946b8a45..a0d23853a52dd 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -159,6 +159,13 @@ class NVVM_SpecialRegisterOp<string mnemonic, list<Trait> traits = []> :
   let assemblyFormat = "attr-dict `:` type($res)";
 }
 
+// NVVM_NCSpecialRegisterOp represents a non-constant special register
+class NVVM_NCSpecialRegisterOp<string mnemonic, list<Trait> traits = []> :
+  NVVM_IntrOp<mnemonic, traits, 1> {
+  let arguments = (ins);
+  let assemblyFormat = "attr-dict `:` type($res)";
+}
+
 class NVVM_SpecialRangeableRegisterOp<string mnemonic, list<Trait> traits = []> :
   NVVM_SpecialRegisterOp<mnemonic,
     !listconcat(traits,
@@ -249,9 +256,9 @@ def NVVM_ClusterDim : NVVM_SpecialRangeableRegisterOp<"read.ptx.sreg.cluster.nct
 
 //===----------------------------------------------------------------------===//
 // Clock registers
-def NVVM_ClockOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.clock">;
-def NVVM_Clock64Op : NVVM_SpecialRegisterOp<"read.ptx.sreg.clock64">;
-def NVVM_GlobalTimerOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.globaltimer">;
+def NVVM_ClockOp : NVVM_NCSpecialRegisterOp<"read.ptx.sreg.clock">;
+def NVVM_Clock64Op : NVVM_NCSpecialRegisterOp<"read.ptx.sreg.clock64">;
+def NVVM_GlobalTimerOp : NVVM_NCSpecialRegisterOp<"read.ptx.sreg.globaltimer">;
 
 //===----------------------------------------------------------------------===//
 // envreg registers
diff --git a/mlir/test/Dialect/LLVMIR/cse-nvvm.mlir b/mlir/test/Dialect/LLVMIR/cse-nvvm.mlir
new file mode 100644
index 0000000000000..8d24c3846f178
--- /dev/null
+++ b/mlir/test/Dialect/LLVMIR/cse-nvvm.mlir
@@ -0,0 +1,37 @@
+// RUN: mlir-opt %s -cse -split-input-file -verify-diagnostics | FileCheck %s
+
+// CHECK-LABEL: @nvvm_special_regs_clock
+llvm.func @nvvm_special_regs_clock() -> !llvm.struct<(i32, i32)> {
+  %0 = llvm.mlir.zero: !llvm.struct<(i32, i32)>
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.clock
+  %1 = nvvm.read.ptx.sreg.clock : i32
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.clock
+  %2 = nvvm.read.ptx.sreg.clock : i32
+  %4 = llvm.insertvalue %1, %0[0]: !llvm.struct<(i32, i32)>
+  %5 = llvm.insertvalue %2, %4[1]: !llvm.struct<(i32, i32)>
+  llvm.return %5: !llvm.struct<(i32, i32)>
+}
+
+// CHECK-LABEL: @nvvm_special_regs_clock64
+llvm.func @nvvm_special_regs_clock64() -> !llvm.struct<(i64, i64)> {
+  %0 = llvm.mlir.zero: !llvm.struct<(i64, i64)>
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.clock64
+  %1 = nvvm.read.ptx.sreg.clock64 : i64
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.clock64
+  %2 = nvvm.read.ptx.sreg.clock64 : i64
+  %4 = llvm.insertvalue %1, %0[0]: !llvm.struct<(i64, i64)>
+  %5 = llvm.insertvalue %2, %4[1]: !llvm.struct<(i64, i64)>
+  llvm.return %5: !llvm.struct<(i64, i64)>
+}
+
+// CHECK-LABEL: @nvvm_special_regs_globaltimer
+llvm.func @nvvm_special_regs_globaltimer() -> !llvm.struct<(i64, i64)> {
+  %0 = llvm.mlir.zero: !llvm.struct<(i64, i64)>
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.globaltimer
+  %1 = nvvm.read.ptx.sreg.globaltimer : i64
+  // CHECK:  {{.*}} = nvvm.read.ptx.sreg.globaltimer
+  %2 = nvvm.read.ptx.sreg.globaltimer : i64
+  %4 = llvm.insertvalue %1, %0[0]: !llvm.struct<(i64, i64)>
+  %5 = llvm.insertvalue %2, %4[1]: !llvm.struct<(i64, i64)>
+  llvm.return %5: !llvm.struct<(i64, i64)>
+}

@grypp
Copy link
Member

grypp commented Jul 9, 2025

Thanks good catch!

@schwarzschild-radius schwarzschild-radius force-pushed the update_traits_for_clock_ops branch from bd6b00c to ef6165d Compare July 10, 2025 07:04
Copy link
Contributor

@durga4github durga4github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor nit.

@schwarzschild-radius schwarzschild-radius force-pushed the update_traits_for_clock_ops branch 2 times, most recently from ddab115 to f49d36d Compare July 11, 2025 06:18
Comment on lines 156 to 158
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need extra comment here. Pure is self-explanatory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Reverted the comment

Copy link
Member

@grypp grypp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nit, all good

This commit removes Pure trait from clock, clock64 and globaltimer Ops by creating NVVM_NCSpecialRegisterOp class to represent Ops which return non-constant values. This prevents CSE pass from optimizing away redundant uses of them
@schwarzschild-radius schwarzschild-radius force-pushed the update_traits_for_clock_ops branch from f49d36d to 9c0d0f6 Compare July 11, 2025 06:36
@schwarzschild-radius schwarzschild-radius merged commit 5cd56c9 into llvm:main Jul 11, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants