Skip to content

[AMDGPU] Recognise bitmask operations as srcmods #149110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

chrisjbris
Copy link
Contributor

@chrisjbris chrisjbris commented Jul 16, 2025

Add to the VOP patterns to recognise when or/xor/and are modifying only the sign bit and replace with the appropriate srcmod. This enables the use of src mods on integers too.

I think this method is preferred to #147325, as it is simpler and avoids the workarounds for simplydemandedbits / doesn't have to be awkwardly imposed on the Target Independent Combine.

I was uncertain about the status of srcmods for the 16-bit types, so this type is intentionally left unaffected.

Add to the VOP patterns to recognise when or/xor/and are modifying only
the sign bit and replace with the appropriate srcmod.
@chrisjbris chrisjbris requested a review from JanekvO July 16, 2025 14:50
@chrisjbris chrisjbris self-assigned this Jul 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Chris Jackson (chrisjbris)

Changes

Add to the VOP patterns to recognise when or/xor/and are modifying only the sign bit and replace with the appropriate srcmod.


Patch is 70.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149110.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+30)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll (+15-26)
  • (added) llvm/test/CodeGen/AMDGPU/integer-select-src-modifiers.ll (+612)
  • (modified) llvm/test/CodeGen/AMDGPU/saddsat.ll (+21-31)
  • (modified) llvm/test/CodeGen/AMDGPU/ssubsat.ll (+153-225)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 25672a52345cb..3b10f12e984dc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -3036,6 +3036,36 @@ bool AMDGPUDAGToDAGISel::SelectVOP3ModsImpl(SDValue In, SDValue &Src,
     Src = Src.getOperand(0);
   }
 
+  // Convert various sign-bit masks to src mods. Currently disabled for 16-bit
+  // types as the codegen replaces the operand without adding a srcmod.
+  // Recognise (xor a, 0x80000000) as NEG SrcMod.
+  if (Src->getOpcode() == ISD::XOR &&
+      Src.getValueType().getFixedSizeInBits() != 16)
+    if (ConstantSDNode *CRHS = dyn_cast<ConstantSDNode>(Src->getOperand(1)))
+      if (CRHS->getAPIntValue().isSignMask()) {
+        Mods |= SISrcMods::NEG;
+        Src = Src.getOperand(0);
+      }
+
+  // Recognise (and a, 0x7fffffff) as ABS SrcMod.
+  if (Src->getOpcode() == ISD::AND && AllowAbs &&
+      Src.getValueType().getFixedSizeInBits() != 16)
+    if (ConstantSDNode *CRHS = dyn_cast<ConstantSDNode>(Src->getOperand(1)))
+      if (CRHS->getAPIntValue().isMaxSignedValue()) {
+        Mods |= SISrcMods::ABS;
+        Src = Src.getOperand(0);
+      }
+
+  // Recognise (or a, 0x80000000) as NEG+ABS SrcModifiers.
+  if (Src->getOpcode() == ISD::OR && AllowAbs &&
+      Src.getValueType().getFixedSizeInBits() != 16)
+    if (ConstantSDNode *CRHS = dyn_cast<ConstantSDNode>(Src->getOperand(1)))
+      if (CRHS->getAPIntValue().isSignMask()) {
+        Mods |= SISrcMods::ABS;
+        Mods |= SISrcMods::NEG;
+        Src = Src.getOperand(0);
+      }
+
   return true;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll b/llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
index 1b092b283290a..5674ae328406d 100644
--- a/llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
+++ b/llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
@@ -349,29 +349,24 @@ define i32 @select_fneg_xor_select_i32(i1 %cond0, i1 %cond1, i32 %arg0, i32 %arg
 ; GCN:       ; %bb.0:
 ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GCN-NEXT:    v_and_b32_e32 v0, 1, v0
-; GCN-NEXT:    v_xor_b32_e32 v2, 0x80000000, v2
-; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v0
 ; GCN-NEXT:    v_and_b32_e32 v1, 1, v1
-; GCN-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
-; GCN-NEXT:    v_xor_b32_e32 v2, 0x80000000, v0
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -v2, v3, vcc
 ; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v1
-; GCN-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v0, -v0, vcc
 ; GCN-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: select_fneg_xor_select_i32:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_and_b32_e32 v0, 1, v0
-; GFX11-NEXT:    v_xor_b32_e32 v2, 0x80000000, v2
 ; GFX11-NEXT:    v_and_b32_e32 v1, 1, v1
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
+; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_3)
 ; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v0
-; GFX11-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc_lo
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -v2, v3, vcc_lo
 ; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
-; GFX11-NEXT:    v_xor_b32_e32 v2, 0x80000000, v0
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc_lo
+; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v0, -v0, vcc_lo
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
   %fneg0 = xor i32 %arg0, -2147483648
   %select0 = select i1 %cond0, i32 %arg1, i32 %fneg0
@@ -550,31 +545,25 @@ define i64 @select_fneg_xor_select_i64(i1 %cond0, i1 %cond1, i64 %arg0, i64 %arg
 ; GCN:       ; %bb.0:
 ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GCN-NEXT:    v_and_b32_e32 v0, 1, v0
-; GCN-NEXT:    v_xor_b32_e32 v3, 0x80000000, v3
-; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v0
 ; GCN-NEXT:    v_and_b32_e32 v1, 1, v1
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v0
 ; GCN-NEXT:    v_cndmask_b32_e32 v0, v2, v4, vcc
-; GCN-NEXT:    v_cndmask_b32_e32 v2, v3, v5, vcc
-; GCN-NEXT:    v_xor_b32_e32 v3, 0x80000000, v2
+; GCN-NEXT:    v_cndmask_b32_e64 v2, -v3, v5, vcc
 ; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 1, v1
-; GCN-NEXT:    v_cndmask_b32_e32 v1, v2, v3, vcc
+; GCN-NEXT:    v_cndmask_b32_e64 v1, v2, -v2, vcc
 ; GCN-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: select_fneg_xor_select_i64:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_and_b32_e32 v0, 1, v0
-; GFX11-NEXT:    v_xor_b32_e32 v3, 0x80000000, v3
-; GFX11-NEXT:    v_and_b32_e32 v1, 1, v1
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_4)
+; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_2) | instid1(VALU_DEP_2)
 ; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v0
-; GFX11-NEXT:    v_cndmask_b32_e32 v0, v2, v4, vcc_lo
-; GFX11-NEXT:    v_cndmask_b32_e32 v2, v3, v5, vcc_lo
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT:    v_dual_cndmask_b32 v0, v2, v4 :: v_dual_and_b32 v1, 1, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v2, -v3, v5, vcc_lo
 ; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 1, v1
-; GFX11-NEXT:    v_xor_b32_e32 v3, 0x80000000, v2
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_cndmask_b32_e32 v1, v2, v3, vcc_lo
+; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, v2, -v2, vcc_lo
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
   %fneg0 = xor i64 %arg0, 9223372036854775808
   %select0 = select i1 %cond0, i64 %arg1, i64 %fneg0
diff --git a/llvm/test/CodeGen/AMDGPU/integer-select-src-modifiers.ll b/llvm/test/CodeGen/AMDGPU/integer-select-src-modifiers.ll
new file mode 100644
index 0000000000000..8b8ddc6a99b75
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/integer-select-src-modifiers.ll
@@ -0,0 +1,612 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck -check-prefixes=GCN,GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-TRUE16 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-FAKE16 %s
+
+define i32 @fneg_select_i32_1(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_select_i32_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v2, -v1, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i32_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v2, -v1, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i32 %a, u0x80000000
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %neg.a, i32 %b
+  ret i32 %select
+}
+
+define i32 @fneg_select_i32_2(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_select_i32_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -v1, v2, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i32_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -v1, v2, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i32 %a, u0x80000000
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %b, i32 %neg.a
+  ret i32 %select
+}
+
+define i32 @fneg_select_i32_both(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_select_i32_both:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -v2, -v1, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i32_both:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -v2, -v1, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i32 %a, u0x80000000
+  %neg.b = xor i32 %b, u0x80000000
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %neg.a, i32 %neg.b
+  ret i32 %select
+}
+
+define i32 @fneg_1_fabs_2_select_i32(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_1_fabs_2_select_i32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, |v1|, -v1, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_1_fabs_2_select_i32:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, |v1|, -v1, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i32 %a, u0x80000000
+  %abs.b = and i32 %a, u0x7fffffff
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %neg.a, i32 %abs.b
+  ret i32 %select
+}
+
+define <2 x i32> @fneg_select_v2i32_1(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_select_v2i32_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v4, -v2, vcc
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT:    v_cndmask_b32_e64 v1, v5, -v3, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_v2i32_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v4, -v2, vcc_lo
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, v5, -v3, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor <2 x i32> %a, splat (i32 u0x80000000)
+  %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+  %select = select <2 x i1> %cmp, <2 x i32> %neg.a, <2 x i32> %b
+  ret <2 x i32> %select
+}
+
+define <2 x i32> @fneg_select_v2i32_2(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_select_v2i32_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -v2, v4, vcc
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT:    v_cndmask_b32_e64 v1, -v3, v5, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_v2i32_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -v2, v4, vcc_lo
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, -v3, v5, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor <2 x i32> %a, splat (i32 u0x80000000)
+  %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+  %select = select <2 x i1> %cmp, <2 x i32> %b, <2 x i32> %neg.a
+  ret <2 x i32> %select
+}
+
+define i32 @fabs_select_i32_1(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fabs_select_i32_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v2, |v1|, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fabs_select_i32_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v2, |v1|, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = and i32 %a, u0x7fffffff
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %neg.a, i32 %b
+  ret i32 %select
+}
+
+define i32 @fabs_select_i32_2(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fabs_select_i32_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, |v1|, v2, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fabs_select_i32_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, |v1|, v2, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = and i32 %a, u0x7fffffff
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %b, i32 %neg.a
+  ret i32 %select
+}
+
+define <2 x i32> @fneg_1_fabs_2_select_v2i32(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_1_fabs_2_select_v2i32:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -v2, |v2|, vcc
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT:    v_cndmask_b32_e64 v1, -v3, |v3|, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_1_fabs_2_select_v2i32:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -v2, |v2|, vcc_lo
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, -v3, |v3|, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor <2 x i32> %a, splat (i32 u0x80000000)
+  %abs.b = and <2 x i32> %a, splat (i32 u0x7fffffff)
+  %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+  %select = select <2 x i1> %cmp, <2 x i32> %abs.b, <2 x i32> %neg.a
+  ret <2 x i32> %select
+}
+
+define i32 @fneg_fabs_select_i32_1(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_fabs_select_i32_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v2, -|v1|, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_i32_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v2, -|v1|, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = or i32 %a, u0x80000000
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %neg.a, i32 %b
+  ret i32 %select
+}
+
+define i32 @fneg_fabs_select_i32_2(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_fabs_select_i32_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -|v1|, v2, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_i32_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -|v1|, v2, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = or i32 %a, u0x80000000
+  %cmp = icmp eq i32 %cond, zeroinitializer
+  %select = select i1 %cmp, i32 %b, i32 %neg.a
+  ret i32 %select
+}
+
+define <2 x i32> @fneg_fabs_select_v2i32_1(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_fabs_select_v2i32_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, v4, -|v2|, vcc
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT:    v_cndmask_b32_e64 v1, v5, -|v3|, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_v2i32_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, v4, -|v2|, vcc_lo
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, v5, -|v3|, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = or <2 x i32> %a, splat (i32 u0x80000000)
+  %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+  %select = select <2 x i1> %cmp, <2 x i32> %neg.a, <2 x i32> %b
+  ret <2 x i32> %select
+}
+
+define <2 x i32> @fneg_fabs_select_v2i32_2(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_fabs_select_v2i32_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT:    v_cndmask_b32_e64 v0, -|v2|, v4, vcc
+; GCN-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT:    v_cndmask_b32_e64 v1, -|v3|, v5, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_v2i32_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT:    v_cndmask_b32_e64 v0, -|v2|, v4, vcc_lo
+; GFX11-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, -|v3|, v5, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = or <2 x i32> %a, splat (i32 u0x80000000)
+  %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+  %select = select <2 x i1> %cmp, <2 x i32> %b, <2 x i32> %neg.a
+  ret <2 x i32> %select
+}
+
+define i64 @fneg_select_i64_1(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fneg_select_i64_1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT:    v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT:    v_cndmask_b32_e64 v1, v5, -v3, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i64_1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT:    v_cndmask_b32_e32 v0, v4, v2, vcc_lo
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, v5, -v3, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i64 %a, u0x8000000000000000
+  %cmp = icmp eq i64 %cond, zeroinitializer
+  %select = select i1 %cmp, i64 %neg.a, i64 %b
+  ret i64 %select
+}
+
+define i64 @fneg_select_i64_2(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fneg_select_i64_2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT:    v_cndmask_b32_e32 v0, v2, v4, vcc
+; GCN-NEXT:    v_cndmask_b32_e64 v1, -v3, v5, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i64_2:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT:    v_cndmask_b32_e32 v0, v2, v4, vcc_lo
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, -v3, v5, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i64 %a, u0x8000000000000000
+  %cmp = icmp eq i64 %cond, zeroinitializer
+  %select = select i1 %cmp, i64 %b, i64 %neg.a
+  ret i64 %select
+}
+
+define i64 @fneg_1_fabs_2_select_i64(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fneg_1_fabs_2_select_i64:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT:    v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT:    v_cndmask_b32_e64 v1, |v5|, -v3, vcc
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_1_fabs_2_select_i64:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT:    v_cndmask_b32_e32 v0, v4, v2, vcc_lo
+; GFX11-NEXT:    v_cndmask_b32_e64 v1, |v5|, -v3, vcc_lo
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  %neg.a = xor i64 %a, u0x8000000000000000
+  %abs.b = and i64 %b, u0x7fffffffffffffff
+  %cmp = icmp eq i64 %cond, zeroinitializer
+  %select = select i1 %cmp, i64 %neg.a, i64 %abs.b
+  ret i64 %select
+}
+
+define i64 @fabs_select_i64_1(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fabs_select_i64_1:
+; GCN:       ; %b...
[truncated]

Copy link
Contributor

@JanekvO JanekvO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits, really

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some scalar test cases? Particularly for the 64-bit cases. The SALU versions do not support source modifiers, so they should not fold if it's a scalar candidate

@chrisjbris
Copy link
Contributor Author

chrisjbris commented Jul 18, 2025

Can you add some scalar test cases? Particularly for the 64-bit cases. The SALU versions do not support source modifiers, so they should not fold if it's a scalar candidate

Yes, I've added some tests to integer-select-src-modifiers.ll.

Copy link
Contributor

@JanekvO JanekvO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chrisjbris chrisjbris merged commit c51b48b into llvm:main Jul 22, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 22, 2025

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/23844

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[57/59] Linking CXX executable External/HIP/math_h-hip-6.3.0
[58/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
[59/59] Linking CXX executable External/HIP/TheNextWeek-hip-6.3.0
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
+ ninja check-hip-simple
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: test-suite :: External/HIP/TheNextWeek-hip-6.3.0.test (12 of 14)
******************** TEST 'test-suite :: External/HIP/TheNextWeek-hip-6.3.0.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.time /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek-hip-6.3.0
cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP ; /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out TheNextWeek.reference_output-hip-6.3.0

+ cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP
+ /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out TheNextWeek.reference_output-hip-6.3.0
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'T' and 'q'

Input 1:
Running quads
image width = 400 height = 400
block size = (16, 16) grid size = (25, 25)
Start rendering by GPU.
Done.
Top Differences between quads_gpu.ppm and quads_ref.ppm:
Location (82, 20), Difference: 64, quads_gpu.ppm: (207, 197, 191), quads_ref.ppm: (214, 228, 255)
Location (87, 20), Difference: 59, quads_gpu.ppm: (205, 199, 196), quads_ref.ppm: (214, 228, 255)
Location (81, 20), Difference: 57, quads_gpu.ppm: (201, 196, 198), quads_ref.ppm: (214, 228, 255)
Location (84, 20), Difference: 57, quads_gpu.ppm: (207, 199, 198), quads_ref.ppm: (214, 228, 255)
Location (89, 20), Difference: 54, quads_gpu.ppm: (208, 201, 201), quads_ref.ppm: (214, 228, 255)
Location (83, 20), Difference: 51, quads_gpu.ppm: (205, 202, 204), quads_ref.ppm: (214, 228, 255)
Location (88, 20), Difference: 51, quads_gpu.ppm: (206, 201, 204), quads_ref.ppm: (214, 228, 255)
Location (86, 20), Difference: 49, quads_gpu.ppm: (206, 202, 206), quads_ref.ppm: (214, 228, 255)
Location (90, 20), Difference: 49, quads_gpu.ppm: (210, 205, 206), quads_ref.ppm: (214, 228, 255)
Location (85, 20), Difference: 43, quads_gpu.ppm: (207, 207, 212), quads_ref.ppm: (214, 228, 255)
Running earth
image width = 400 height = 225
block size = (16, 16) grid size = (25, 15)
Start rendering by GPU.
Done.
Top Differences between earth_gpu.ppm and earth_ref.ppm:
Location (202, 3), Difference: 11, earth_gpu.ppm: (205, 219, 244), earth_ref.ppm: (214, 228, 255)
Location (201, 3), Difference: 8, earth_gpu.ppm: (207, 221, 247), earth_ref.ppm: (214, 228, 255)
Location (200, 3), Difference: 5, earth_gpu.ppm: (209, 224, 250), earth_ref.ppm: (214, 228, 255)
Location (198, 3), Difference: 4, earth_gpu.ppm: (210, 225, 251), earth_ref.ppm: (214, 228, 255)
Location (199, 3), Difference: 4, earth_gpu.ppm: (210, 225, 251), earth_ref.ppm: (214, 228, 255)
Location (193, 3), Difference: 2, earth_gpu.ppm: (212, 227, 254), earth_ref.ppm: (214, 228, 255)
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: test-suite :: External/HIP/TheNextWeek-hip-6.3.0.test (12 of 14)
******************** TEST 'test-suite :: External/HIP/TheNextWeek-hip-6.3.0.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.time /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek-hip-6.3.0
cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP ; /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out TheNextWeek.reference_output-hip-6.3.0

+ cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP
+ /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/TheNextWeek-hip-6.3.0.test.out TheNextWeek.reference_output-hip-6.3.0
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'T' and 'q'

Input 1:
Running quads
image width = 400 height = 400
block size = (16, 16) grid size = (25, 25)
Start rendering by GPU.
Done.
Top Differences between quads_gpu.ppm and quads_ref.ppm:
Location (82, 20), Difference: 64, quads_gpu.ppm: (207, 197, 191), quads_ref.ppm: (214, 228, 255)
Location (87, 20), Difference: 59, quads_gpu.ppm: (205, 199, 196), quads_ref.ppm: (214, 228, 255)
Location (81, 20), Difference: 57, quads_gpu.ppm: (201, 196, 198), quads_ref.ppm: (214, 228, 255)
Location (84, 20), Difference: 57, quads_gpu.ppm: (207, 199, 198), quads_ref.ppm: (214, 228, 255)
Location (89, 20), Difference: 54, quads_gpu.ppm: (208, 201, 201), quads_ref.ppm: (214, 228, 255)
Location (83, 20), Difference: 51, quads_gpu.ppm: (205, 202, 204), quads_ref.ppm: (214, 228, 255)
Location (88, 20), Difference: 51, quads_gpu.ppm: (206, 201, 204), quads_ref.ppm: (214, 228, 255)
Location (86, 20), Difference: 49, quads_gpu.ppm: (206, 202, 206), quads_ref.ppm: (214, 228, 255)
Location (90, 20), Difference: 49, quads_gpu.ppm: (210, 205, 206), quads_ref.ppm: (214, 228, 255)
Location (85, 20), Difference: 43, quads_gpu.ppm: (207, 207, 212), quads_ref.ppm: (214, 228, 255)
Running earth
image width = 400 height = 225
block size = (16, 16) grid size = (25, 15)
Start rendering by GPU.
Done.
Top Differences between earth_gpu.ppm and earth_ref.ppm:
Location (202, 3), Difference: 11, earth_gpu.ppm: (205, 219, 244), earth_ref.ppm: (214, 228, 255)
Location (201, 3), Difference: 8, earth_gpu.ppm: (207, 221, 247), earth_ref.ppm: (214, 228, 255)
Location (200, 3), Difference: 5, earth_gpu.ppm: (209, 224, 250), earth_ref.ppm: (214, 228, 255)
Location (198, 3), Difference: 4, earth_gpu.ppm: (210, 225, 251), earth_ref.ppm: (214, 228, 255)
Location (199, 3), Difference: 4, earth_gpu.ppm: (210, 225, 251), earth_ref.ppm: (214, 228, 255)
Location (193, 3), Difference: 2, earth_gpu.ppm: (212, 227, 254), earth_ref.ppm: (214, 228, 255)
Location (194, 3), Difference: 2, earth_gpu.ppm: (212, 227, 254), earth_ref.ppm: (214, 228, 255)
Location (195, 3), Difference: 2, earth_gpu.ppm: (212, 227, 254), earth_ref.ppm: (214, 228, 255)
Location (196, 3), Difference: 2, earth_gpu.ppm: (212, 226, 253), earth_ref.ppm: (214, 228, 255)
Location (197, 3), Difference: 2, earth_gpu.ppm: (212, 226, 253), earth_ref.ppm: (214, 228, 255)
Running two_spheres
image width = 400 height = 225
block size = (16, 16) grid size = (25, 15)

chrisjbris added a commit that referenced this pull request Jul 22, 2025
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 22, 2025

LLVM Buildbot has detected a new failure on builder hip-third-party-libs-test running on ext_buildbot_hw_05-hip-docker while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/206/builds/3661

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py --jobs=32' (failure)
...
-- Kokkos Backends: SERIAL;HIP
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build
[4/8] Performing build step for 'TestKokkosHIP'
[1/3] cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -DRUN_CHECK_GIT_VERSION=1 -DKOKKOS_SOURCE_DIR=/opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP -P /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP/cmake/build_env_info.cmake
[5/8] No install step for 'TestKokkosHIP'
[6/8] No test step for 'TestKokkosHIP'
[7/8] Completed 'TestKokkosHIP'
[8/8] cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -E env GTEST_FILTER=-hip.atomics:hip.bit_manip_bit_ceil ctest
FAILED: External/HIP/CMakeFiles/test-kokkos /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/CMakeFiles/test-kokkos 
cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -E env GTEST_FILTER=-hip.atomics:hip.bit_manip_bit_ceil ctest
Test project /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build
      Start  1: Kokkos_CoreUnitTest_Serial_ViewSupport
 1/54 Test  #1: Kokkos_CoreUnitTest_Serial_ViewSupport .....................   Passed    0.40 sec
      Start  2: Kokkos_CoreUnitTest_HIP_ViewSupport
 2/54 Test  #2: Kokkos_CoreUnitTest_HIP_ViewSupport ........................   Passed    0.39 sec
      Start  3: Kokkos_CoreUnitTest_Serial1
 3/54 Test  #3: Kokkos_CoreUnitTest_Serial1 ................................   Passed   13.62 sec
      Start  4: Kokkos_CoreUnitTest_Serial2
 4/54 Test  #4: Kokkos_CoreUnitTest_Serial2 ................................   Passed   17.47 sec
      Start  5: Kokkos_CoreUnitTest_HIP
 5/54 Test  #5: Kokkos_CoreUnitTest_HIP ....................................   Passed  228.85 sec
      Start  6: Kokkos_CoreUnitTest_HIPInterOpInit
 6/54 Test  #6: Kokkos_CoreUnitTest_HIPInterOpInit .........................   Passed    0.39 sec
      Start  7: Kokkos_CoreUnitTest_HIPInterOpStreams
 7/54 Test  #7: Kokkos_CoreUnitTest_HIPInterOpStreams ......................   Passed    0.41 sec
      Start  8: Kokkos_CoreUnitTest_HIPInterOpGraph
 8/54 Test  #8: Kokkos_CoreUnitTest_HIPInterOpGraph ........................   Passed    0.37 sec
      Start  9: Kokkos_CoreUnitTest_Default
 9/54 Test  #9: Kokkos_CoreUnitTest_Default ................................   Passed    1.09 sec
      Start 10: Kokkos_CoreUnitTest_LegionInitialization
10/54 Test #10: Kokkos_CoreUnitTest_LegionInitialization ...................   Passed    0.38 sec
      Start 11: Kokkos_CoreUnitTest_PushFinalizeHook
11/54 Test #11: Kokkos_CoreUnitTest_PushFinalizeHook .......................   Passed    0.42 sec
      Start 12: Kokkos_CoreUnitTest_ScopeGuard
12/54 Test #12: Kokkos_CoreUnitTest_ScopeGuard .............................   Passed    2.53 sec
      Start 13: Kokkos_CoreUnitTest_Develop
13/54 Test #13: Kokkos_CoreUnitTest_Develop ................................   Passed    0.40 sec
      Start 14: Kokkos_CoreUnitTest_PushFinalizeHookTerminateRegex
14/54 Test #14: Kokkos_CoreUnitTest_PushFinalizeHookTerminateRegex .........   Passed    0.41 sec
      Start 15: Kokkos_CoreUnitTest_PushFinalizeHookTerminateFails
15/54 Test #15: Kokkos_CoreUnitTest_PushFinalizeHookTerminateFails .........   Passed    0.43 sec
      Start 16: Kokkos_CoreUnitTest_KokkosP
16/54 Test #16: Kokkos_CoreUnitTest_KokkosP ................................   Passed    0.43 sec
      Start 17: Kokkos_CoreUnitTest_ToolIndependence
17/54 Test #17: Kokkos_CoreUnitTest_ToolIndependence .......................   Passed    0.02 sec
      Start 18: Kokkos_ProfilingTestLibraryLoadHelp
18/54 Test #18: Kokkos_ProfilingTestLibraryLoadHelp ........................   Passed    0.40 sec
Step 11 (run kokkos test suite) failure: run kokkos test suite (failure)
...
-- Kokkos Backends: SERIAL;HIP
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build
[4/8] Performing build step for 'TestKokkosHIP'
[1/3] cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -DRUN_CHECK_GIT_VERSION=1 -DKOKKOS_SOURCE_DIR=/opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP -P /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP/cmake/build_env_info.cmake
[5/8] No install step for 'TestKokkosHIP'
[6/8] No test step for 'TestKokkosHIP'
[7/8] Completed 'TestKokkosHIP'
[8/8] cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -E env GTEST_FILTER=-hip.atomics:hip.bit_manip_bit_ceil ctest
FAILED: External/HIP/CMakeFiles/test-kokkos /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/CMakeFiles/test-kokkos 
cd /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build && /usr/bin/cmake -E env GTEST_FILTER=-hip.atomics:hip.bit_manip_bit_ceil ctest
Test project /opt/botworker/llvm/llvm-test-suite/TS-build/External/HIP/TestKokkosHIP-prefix/src/TestKokkosHIP-build
      Start  1: Kokkos_CoreUnitTest_Serial_ViewSupport
 1/54 Test  #1: Kokkos_CoreUnitTest_Serial_ViewSupport .....................   Passed    0.40 sec
      Start  2: Kokkos_CoreUnitTest_HIP_ViewSupport
 2/54 Test  #2: Kokkos_CoreUnitTest_HIP_ViewSupport ........................   Passed    0.39 sec
      Start  3: Kokkos_CoreUnitTest_Serial1
 3/54 Test  #3: Kokkos_CoreUnitTest_Serial1 ................................   Passed   13.62 sec
      Start  4: Kokkos_CoreUnitTest_Serial2
 4/54 Test  #4: Kokkos_CoreUnitTest_Serial2 ................................   Passed   17.47 sec
      Start  5: Kokkos_CoreUnitTest_HIP
 5/54 Test  #5: Kokkos_CoreUnitTest_HIP ....................................   Passed  228.85 sec
      Start  6: Kokkos_CoreUnitTest_HIPInterOpInit
 6/54 Test  #6: Kokkos_CoreUnitTest_HIPInterOpInit .........................   Passed    0.39 sec
      Start  7: Kokkos_CoreUnitTest_HIPInterOpStreams
 7/54 Test  #7: Kokkos_CoreUnitTest_HIPInterOpStreams ......................   Passed    0.41 sec
      Start  8: Kokkos_CoreUnitTest_HIPInterOpGraph
 8/54 Test  #8: Kokkos_CoreUnitTest_HIPInterOpGraph ........................   Passed    0.37 sec
      Start  9: Kokkos_CoreUnitTest_Default
 9/54 Test  #9: Kokkos_CoreUnitTest_Default ................................   Passed    1.09 sec
      Start 10: Kokkos_CoreUnitTest_LegionInitialization
10/54 Test #10: Kokkos_CoreUnitTest_LegionInitialization ...................   Passed    0.38 sec
      Start 11: Kokkos_CoreUnitTest_PushFinalizeHook
11/54 Test #11: Kokkos_CoreUnitTest_PushFinalizeHook .......................   Passed    0.42 sec
      Start 12: Kokkos_CoreUnitTest_ScopeGuard
12/54 Test #12: Kokkos_CoreUnitTest_ScopeGuard .............................   Passed    2.53 sec
      Start 13: Kokkos_CoreUnitTest_Develop
13/54 Test #13: Kokkos_CoreUnitTest_Develop ................................   Passed    0.40 sec
      Start 14: Kokkos_CoreUnitTest_PushFinalizeHookTerminateRegex
14/54 Test #14: Kokkos_CoreUnitTest_PushFinalizeHookTerminateRegex .........   Passed    0.41 sec
      Start 15: Kokkos_CoreUnitTest_PushFinalizeHookTerminateFails
15/54 Test #15: Kokkos_CoreUnitTest_PushFinalizeHookTerminateFails .........   Passed    0.43 sec
      Start 16: Kokkos_CoreUnitTest_KokkosP
16/54 Test #16: Kokkos_CoreUnitTest_KokkosP ................................   Passed    0.43 sec
      Start 17: Kokkos_CoreUnitTest_ToolIndependence
17/54 Test #17: Kokkos_CoreUnitTest_ToolIndependence .......................   Passed    0.02 sec
      Start 18: Kokkos_ProfilingTestLibraryLoadHelp
18/54 Test #18: Kokkos_ProfilingTestLibraryLoadHelp ........................   Passed    0.40 sec

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 22, 2025
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
…vm#149110)

Add to the VOP patterns to recognise when or/xor/and are masking only
the most significant bit of i32/v2i32/i64 and replace with the appropriate source modifier.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants