[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 #146569

LU-JOHN · 2025-07-01T16:28:10Z

When performing a 64-bit sra of a negative value with a shift range from [32-63], create the hi-half with a move of -1.

Alive verification: https://alive2.llvm.org/ce/z/kXd7Ac

Also, preserve exact flag. Alive verification: https://alive2.llvm.org/ce/z/L86tXf.

Signed-off-by: John Lu <[email protected]>

llvmbot · 2025-07-01T16:28:39Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (LU-JOHN)

Changes

When performing a 64-bit sra of a negative value with a shift range from [32-63], create the hi-half with a move of -1.

Alive verification: https://alive2.llvm.org/ce/z/kXd7Ac

Full diff: https://github.com/llvm/llvm-project/pull/146569.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+8-2)
(added) llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll (+102)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index d75c7a178b4a8..a877b28119c16 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4218,9 +4218,15 @@ SDValue AMDGPUTargetLowering::performSraCombine(SDNode *N,
     SDValue SplitLHS = DAG.getNode(ISD::BITCAST, LHSSL, ConcatType, LHS);
     Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, LHSSL, TargetType, SplitLHS, One);
   }
-  Hi = DAG.getFreeze(Hi);
 
-  SDValue HiShift = DAG.getNode(ISD::SRA, SL, TargetType, Hi, ShiftFullAmt);
+  KnownBits KnownLHS = DAG.computeKnownBits(LHS);
+  SDValue HiShift;
+  if (KnownLHS.isNegative())
+    HiShift = DAG.getAllOnesConstant(SL, TargetType);
+  else {
+    Hi = DAG.getFreeze(Hi);
+    HiShift = DAG.getNode(ISD::SRA, SL, TargetType, Hi, ShiftFullAmt);
+  }
   SDValue NewShift = DAG.getNode(ISD::SRA, SL, TargetType, Hi, ShiftAmt);
 
   SDValue Vec;
diff --git a/llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll b/llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll
new file mode 100644
index 0000000000000..bb291272555b1
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll
@@ -0,0 +1,102 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 < %s | FileCheck %s
+
+; Test that negative 64-bit values shifted by [32-63] bits have
+; a hi-result created by moving an all-ones constant.
+
+; FIXME: Range metadata is invalidated when i64 types are legalized to v2i32 types.
+; We could call performSraCombine before legalization, but other optimizations only work
+; with 64-bit sra.
+define i64 @scalar_ashr_metadata(ptr %arg0.ptr, ptr %arg1.ptr) {
+; CHECK-LABEL: scalar_ashr_metadata:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dwordx2 v[4:5], v[0:1]
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dword v4, v[2:3]
+; CHECK-NEXT:    ; kill: killed $vgpr0 killed $vgpr1
+; CHECK-NEXT:    ; kill: killed $vgpr2 killed $vgpr3
+; CHECK-NEXT:    v_ashrrev_i32_e32 v1, 31, v5
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_ashrrev_i32_e32 v0, v4, v5
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %val = load i64, ptr %arg0.ptr, !range !0, !noundef !{}
+  %shift.amt = load i64, ptr %arg1.ptr, !range !1, !noundef !{}
+  %ashr = ashr i64 %val, %shift.amt
+  ret i64 %ashr
+}
+
+define <2 x i64> @v2_ashr_metadata(ptr %arg0.ptr, ptr %arg1.ptr) {
+; CHECK-LABEL: v2_ashr_metadata:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dwordx4 v[4:7], v[0:1]
+; CHECK-NEXT:    flat_load_dwordx4 v[8:11], v[2:3]
+; CHECK-NEXT:    v_mov_b32_e32 v1, -1
+; CHECK-NEXT:    v_mov_b32_e32 v3, -1
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_ashrrev_i32_e32 v0, v8, v5
+; CHECK-NEXT:    v_ashrrev_i32_e32 v2, v10, v7
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %val = load <2 x i64>, ptr %arg0.ptr, !range !2, !noundef !{}
+  %shift.amt = load <2 x i64>, ptr %arg1.ptr, !range !3, !noundef !{}
+  %ashr = ashr <2 x i64> %val, %shift.amt
+  ret <2 x i64> %ashr
+}
+
+define <3 x i64> @v3_ashr_metadata(ptr %arg0.ptr, ptr %arg1.ptr) {
+; CHECK-LABEL: v3_ashr_metadata:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dwordx4 v[4:7], v[0:1]
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dword v4, v[0:1] offset:20
+; CHECK-NEXT:    flat_load_dword v6, v[2:3] offset:16
+; CHECK-NEXT:    flat_load_dwordx4 v[8:11], v[2:3]
+; CHECK-NEXT:    v_mov_b32_e32 v1, -1
+; CHECK-NEXT:    v_mov_b32_e32 v3, -1
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_ashrrev_i32_e32 v4, v6, v4
+; CHECK-NEXT:    v_ashrrev_i32_e32 v0, v8, v5
+; CHECK-NEXT:    v_ashrrev_i32_e32 v2, v10, v7
+; CHECK-NEXT:    v_mov_b32_e32 v5, -1
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %val = load <3 x i64>, ptr %arg0.ptr, !range !4, !noundef !{}
+  %shift.amt = load <3 x i64>, ptr %arg1.ptr, !range !5, !noundef !{}
+  %ashr = ashr <3 x i64> %val, %shift.amt
+  ret <3 x i64> %ashr
+}
+
+define <4 x i64> @v4_ashr_metadata(ptr %arg0.ptr, ptr %arg1.ptr) {
+; CHECK-LABEL: v4_ashr_metadata:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dwordx4 v[4:7], v[2:3]
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    flat_load_dwordx4 v[7:10], v[0:1]
+; CHECK-NEXT:    flat_load_dwordx4 v[11:14], v[0:1] offset:16
+; CHECK-NEXT:    flat_load_dwordx4 v[15:18], v[2:3] offset:16
+; CHECK-NEXT:    v_mov_b32_e32 v1, -1
+; CHECK-NEXT:    v_mov_b32_e32 v3, -1
+; CHECK-NEXT:    v_mov_b32_e32 v5, -1
+; CHECK-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_mov_b32_e32 v7, -1
+; CHECK-NEXT:    v_ashrrev_i32_e32 v0, v4, v8
+; CHECK-NEXT:    v_ashrrev_i32_e32 v2, v6, v10
+; CHECK-NEXT:    v_ashrrev_i32_e32 v4, v15, v12
+; CHECK-NEXT:    v_ashrrev_i32_e32 v6, v17, v14
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %val = load <4 x i64>, ptr %arg0.ptr, !range !6, !noundef !{}
+  %shift.amt = load <4 x i64>, ptr %arg1.ptr, !range !7, !noundef !{}
+  %ashr = ashr <4 x i64> %val, %shift.amt
+  ret <4 x i64> %ashr
+}
+
+!0 = !{i64 -6000000000, i64 0}
+!1 = !{i64 32, i64 64}
+!2 = !{i64 -7000000000, i64 -1000}
+!3 = !{i64 38, i64 64}
+!4 = !{i64 -8000000000, i64 -2001}
+!5 = !{i64 38, i64 60}
+!6 = !{i64 -9000000000, i64 -3002}
+!7 = !{i64 38, i64 50}

shiltian · 2025-07-01T16:29:39Z

llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll

@@ -0,0 +1,102 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 < %s | FileCheck %s


nit: why pal?

This was derived from ashr64_reduce.ll. There's no particular reason why pal was chosen.

Because amdhsa errors on the shader calling conventions but it's useful to use them for the return in SGPRs

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll

Signed-off-by: John Lu <[email protected]>

arsenm · 2025-07-02T09:31:55Z

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

+    HiShift = DAG.getAllOnesConstant(SL, TargetType);
+  } else {
+    Hi = DAG.getFreeze(Hi);
+    HiShift = DAG.getNode(ISD::SRA, SL, TargetType, Hi, ShiftFullAmt);


Can this preserve flags?

The ashr for the hi-half cannot preserve an exact flag, but the ashr for the lo-half has been updated to preserve the exact flag.

It doesn't seem to preserve the flags in the previous changes?

Yes, it was already an issue with the previous commit.

Signed-off-by: John Lu <[email protected]>

Create hi-half of 64-bit ashr with mov of -1

2e32d5d

Signed-off-by: John Lu <[email protected]>

llvmbot added the backend:AMDGPU label Jul 1, 2025

shiltian reviewed Jul 1, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/neg_ashr64_reduce.ll Show resolved Hide resolved

LU-JOHN added 2 commits July 1, 2025 16:22

Add negative tests

8039cb5

Signed-off-by: John Lu <[email protected]>

Add parens on if-side to match else-side

2e2950d

Signed-off-by: John Lu <[email protected]>

shiltian approved these changes Jul 1, 2025

View reviewed changes

arsenm reviewed Jul 2, 2025

View reviewed changes

Preserve exact flag

54615b5

Signed-off-by: John Lu <[email protected]>

LU-JOHN requested a review from arsenm July 2, 2025 14:44

shiltian merged commit 896d900 into llvm:main Jul 9, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 #146569

[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 #146569

Uh oh!

LU-JOHN commented Jul 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

shiltian Jul 1, 2025 •

edited

Loading

Uh oh!

LU-JOHN Jul 1, 2025

Uh oh!

arsenm Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

arsenm Jul 2, 2025

Uh oh!

LU-JOHN Jul 2, 2025

Uh oh!

shiltian Jul 2, 2025

Uh oh!

LU-JOHN Jul 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,102 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 < %s \| FileCheck %s

[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 #146569

[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 #146569

Uh oh!

Conversation

LU-JOHN commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

shiltian Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arsenm Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

LU-JOHN Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LU-JOHN commented Jul 1, 2025 •

edited

Loading

shiltian Jul 1, 2025 •

edited

Loading