Skip to content

Conversation

@broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Feb 26, 2025

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.
@broxigarchen broxigarchen marked this pull request as ready for review February 27, 2025 21:28
@broxigarchen broxigarchen changed the title True16 Add OpSel when optimizing exec mask [AMDGPU][True16][CodeGen] True16 Add OpSel when optimizing exec mask Feb 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 27, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.


Full diff: https://github.com/llvm/llvm-project/pull/128928.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp (+2)
  • (added) llvm/test/CodeGen/AMDGPU/true16-saveexec.mir (+64)
diff --git a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
index 920c3e11e4718..745e4086bc7fe 100644
--- a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
+++ b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
@@ -632,6 +632,8 @@ bool SIOptimizeExecMasking::optimizeVCMPSaveExecSequence(
 
   TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::clamp);
 
+  TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::op_sel);
+
   // The kill flags may no longer be correct.
   if (Src0->isReg())
     MRI->clearKillFlags(Src0->getReg());
diff --git a/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir b/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir
new file mode 100644
index 0000000000000..07259a1d031f3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/true16-saveexec.mir
@@ -0,0 +1,64 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize64 -run-pass=si-optimize-exec-masking -verify-machineinstrs -o - %s | FileCheck %s
+
+---
+name:            int
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: int
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $vgpr20
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $sgpr0_sgpr1 = S_MOV_B64 $exec
+  ; CHECK-NEXT:   V_CMPX_LT_I16_t16_nosdst_e64 0, 15, 0, $vgpr20_lo16, 0, implicit-def $exec, implicit $exec
+  ; CHECK-NEXT:   renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+  ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.1, implicit $exec
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    liveins: $vgpr20
+    $vcc = V_CMP_LT_I16_t16_e64 0, 15, 0, $vgpr20_lo16, 0, implicit $exec
+    renamable $sgpr0_sgpr1 = COPY $exec, implicit-def $exec
+    renamable $sgpr2_sgpr3 = S_AND_B64 renamable $sgpr0_sgpr1, killed $vcc, implicit-def dead $scc
+    renamable $sgpr0_sgpr1 = S_XOR_B64 renamable $sgpr2_sgpr3, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+    $exec = S_MOV_B64_term killed renamable $sgpr2_sgpr3
+    S_CBRANCH_EXECZ %bb.2, implicit $exec
+    S_BRANCH %bb.2
+
+  bb.2:
+    S_ENDPGM 0
+...
+
+---
+name:            float
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: float
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $vgpr20
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $sgpr0_sgpr1 = S_MOV_B64 $exec
+  ; CHECK-NEXT:   V_CMPX_LT_F16_t16_nosdst_e64 0, 15, 0, $vgpr20_lo16, 1, 0, implicit-def $exec, implicit $mode, implicit $exec
+  ; CHECK-NEXT:   renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+  ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.1, implicit $exec
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.1:
+    liveins: $vgpr20
+    $vcc = V_CMP_LT_F16_t16_e64 0, 15, 0, $vgpr20_lo16, 1, 0, implicit $exec, implicit $mode
+    renamable $sgpr0_sgpr1 = COPY $exec, implicit-def $exec
+    renamable $sgpr2_sgpr3 = S_AND_B64 renamable $sgpr0_sgpr1, killed $vcc, implicit-def dead $scc
+    renamable $sgpr0_sgpr1 = S_XOR_B64 renamable $sgpr2_sgpr3, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
+    $exec = S_MOV_B64_term killed renamable $sgpr2_sgpr3
+    S_CBRANCH_EXECZ %bb.2, implicit $exec
+    S_BRANCH %bb.2
+
+  bb.2:
+    S_ENDPGM 0
+...

@broxigarchen broxigarchen merged commit db973ce into llvm:main Feb 28, 2025
5 of 9 checks passed
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…lvm#128928)

True16 Add OpSel when optimizing exec mask

True16 VOPCX have the opsel argument. Add it when we create these
instructions in SIOptimizeExecMasking.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants