[AMDGPU] Improved Lowering of abs(i8/i16) and -abs(i8/i16) #165626

linuxrocks123 · 2025-10-29T21:33:36Z

This PR improves the lowering of abs(i16) and -abs(i16) on the AMDGPU target. It is written as an early Machine IR-level pass since the transformation is only profitable for SGPR registers as there is no dedicated abs instruction for VGPRs, and it is only possible to determine whether a value is VGPR or SGPR after ISel.

An earlier failed, correct-but-pessimizing attempt overriding expandABS at the DAG level is in the Git history.

llvmbot · 2025-10-29T21:34:09Z

@llvm/pr-subscribers-backend-amdgpu

Author: Patrick Simmons (linuxrocks123)

Changes

This PR improves the lowering of abs(i16) and -abs(i16) on the AMDGPU target. It is written as an early Machine IR-level pass since the transformation is only profitable for SGPR registers as there is no dedicated abs instruction for VGPRs, and it is only possible to determine whether a value is VGPR or SGPR after ISel.

An earlier failed, correct-but-pessimizing attempt overriding expandABS at the DAG level is in the Git history.

Full diff: https://github.com/llvm/llvm-project/pull/165626.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+11)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+3)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(added) llvm/test/CodeGen/AMDGPU/s_abs_i16.ll (+22)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index ce2b4a5f6f2e9..43a052b687109 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -39,6 +39,7 @@ FunctionPass *createSIAnnotateControlFlowLegacyPass();
 FunctionPass *createSIFoldOperandsLegacyPass();
 FunctionPass *createSIPeepholeSDWALegacyPass();
 FunctionPass *createSILowerI1CopiesLegacyPass();
+FunctionPass *createSISAbs16FixupLegacyPass();
 FunctionPass *createSIShrinkInstructionsLegacyPass();
 FunctionPass *createSILoadStoreOptimizerLegacyPass();
 FunctionPass *createSIWholeQuadModeLegacyPass();
@@ -93,6 +94,13 @@ class SILowerI1CopiesPass : public PassInfoMixin<SILowerI1CopiesPass> {
                         MachineFunctionAnalysisManager &MFAM);
 };
 
+class SISAbs16FixupPass : public PassInfoMixin<SISAbs16FixupPass> {
+public:
+  SISAbs16FixupPass() = default;
+  PreservedAnalyses run(MachineFunction &MF,
+                        MachineFunctionAnalysisManager &MFAM);
+};
+
 void initializeAMDGPUDAGToDAGISelLegacyPass(PassRegistry &);
 
 void initializeAMDGPUAlwaysInlinePass(PassRegistry&);
@@ -197,6 +205,9 @@ extern char &SILowerWWMCopiesLegacyID;
 void initializeSILowerI1CopiesLegacyPass(PassRegistry &);
 extern char &SILowerI1CopiesLegacyID;
 
+void initializeSISAbs16FixupLegacyPass(PassRegistry &);
+extern char &SISAbs16FixupLegacyID;
+
 void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
 extern char &AMDGPUGlobalISelDivergenceLoweringID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 996b55f42fd0b..90405fed8efdd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -551,6 +551,7 @@ extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   initializeAMDGPUPrepareAGPRAllocLegacyPass(*PR);
   initializeGCNDPPCombineLegacyPass(*PR);
   initializeSILowerI1CopiesLegacyPass(*PR);
+  initializeSISAbs16FixupLegacyPass(*PR);
   initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
   initializeAMDGPURegBankSelectPass(*PR);
   initializeAMDGPURegBankLegalizePass(*PR);
@@ -1517,6 +1518,7 @@ bool GCNPassConfig::addInstSelector() {
   AMDGPUPassConfig::addInstSelector();
   addPass(&SIFixSGPRCopiesLegacyID);
   addPass(createSILowerI1CopiesLegacyPass());
+  addPass(createSISAbs16FixupLegacyPass());
   return false;
 }
 
@@ -2209,6 +2211,7 @@ Error AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(AMDGPUISelDAGToDAGPass(TM));
   addPass(SIFixSGPRCopiesPass());
   addPass(SILowerI1CopiesPass());
+  addPass(SISAbs16FixupPass());
   return Error::success();
 }
 
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index a1e0e5293c706..cd9225acdb002 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -185,6 +185,7 @@ add_llvm_target(AMDGPUCodeGen
   SIPreEmitPeephole.cpp
   SIProgramInfo.cpp
   SIRegisterInfo.cpp
+  SISAbs16Fixup.cpp
   SIShrinkInstructions.cpp
   SIWholeQuadMode.cpp
 
diff --git a/llvm/test/CodeGen/AMDGPU/s_abs_i16.ll b/llvm/test/CodeGen/AMDGPU/s_abs_i16.ll
new file mode 100644
index 0000000000000..e61abb7173d78
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/s_abs_i16.ll
@@ -0,0 +1,22 @@
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 < %s | FileCheck %s
+
+define amdgpu_ps i16 @abs_i16(i16 inreg %arg) {
+; CHECK-LABEL: abs_i16:
+; CHECK: %bb.0:
+; CHECK-NEXT: s_sext_i32_i16 s0, s0
+; CHECK-NEXT: s_abs_i32 s0, s0
+
+  %res = call i16 @llvm.abs.i16(i16 %arg, i1 false)
+  ret i16 %res
+}
+
+define amdgpu_ps i16 @abs_i16_neg(i16 inreg %arg) {
+; CHECK-LABEL: abs_i16_neg:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: s_sext_i32_i16 s0, s0
+; CHECK-NEXT: s_abs_i32 s0, s0
+; CHECK-NEXT: s_sub_i32 s0, 0, s0
+  %res1 = call i16 @llvm.abs.i16(i16 %arg, i1 false)
+  %res2 = sub i16 0, %res1
+  ret i16 %res2
+}
\ No newline at end of file

jayfoad · 2025-10-30T07:38:58Z

Patch is missing the new file. Also I would really hope we can find a way to do this without adding a dedicated pass.

github-actions · 2025-10-30T07:46:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

linuxrocks123 · 2025-10-30T07:59:51Z

@jayfoad thanks, I added the new file.

To avoid adding a new pass, maybe we could combine this and SILowerI1Copies into a pass containing a collection of short optimizations, similar to what AMDGPUCodegenPrepare is. I thought about doing that but went with the default option of touching as little existing and unrelated code as possible.

By the way, this isn't DAG-level because the transformation pessimizes 16-bit abs on VGPRs if it runs on them, and we don't know what the register classes are in the DAG yet.

jmmartinez

To see better the improvement motivating this patch.

Could you squash the commits and split them in 2: one where you pre-commit the tests, and one where you solve the issue.

jmmartinez · 2025-10-30T09:19:19Z

llvm/test/CodeGen/AMDGPU/s_abs_i16.ll

@@ -0,0 +1,22 @@
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 < %s | FileCheck %s


Can you run update_llc_test_checks.py over this file ?

Please add a GISel test line too.

linuxrocks123 · 2025-10-30T20:13:54Z

Hi @jmmartinez, thanks for reviewing this PR. Normally I would not have a problem squashing the commits as you suggested, but, in this case, the commit history contains two failed implementation attempts that I would like to preserve during the review process in case a method of fixing one of them is discovered. Would it be acceptable if I posted the master/branch assemblies of the s_cmp_0.ll testcase instead to facilitate seeing the improvement?

jmmartinez · 2025-10-31T08:52:08Z

Hi @jmmartinez, thanks for reviewing this PR. Normally I would not have a problem squashing the commits as you suggested, but, in this case, the commit history contains two failed implementation attempts that I would like to preserve during the review process in case a method of fixing one of them is discovered. Would it be acceptable if I posted the master/branch assemblies of the s_cmp_0.ll testcase instead to facilitate seeing the improvement?

No problem in keeping the previous cases, but at least seeing the difference in s_abs_16 would be nice (and knowing what happens for gisel).

linuxrocks123 · 2025-10-31T19:20:11Z

@jmmartinez interestingly, the problem doesn't exist with global isel enabled. The main branch generates the optimal instruction selection without this PR.

If we are planning to switch to global isel soon, we may not need this.

linuxrocks123 · 2025-10-31T19:21:42Z

@jmmartinez, for non-global-isel, here are the differing testcase files:

master.s.txt
branch.s.txt

linuxrocks123 · 2025-10-31T21:54:42Z

@jmmartinez @jayfoad I think this is everything. If we're ready to merge this as a new pass, I'll update the llc-pipeline tests to pass. If we'd rather combine this with an existing early MIR pass, such as SILowerI1Copies, please let me know, and I will do that instead.

arsenm

This doesn't really fit with the usual strategy for dealing with this sort of issue (it's sort of backwards). The usual strategy is something closer to treat the operation as legal, and deal with it later if it isn't.

There are a few options. What I suggest is to make ISD::ABS custom for i16. In the custom lowering, check if the input is divergent, return to use the default expansion.

That should get the right result most of the time. There's a possible edge case where SIFixVGPRCopies will need to deal s_abs_i16 with SGPR inputs

arsenm · 2025-11-01T03:44:58Z

llvm/test/CodeGen/AMDGPU/s_abs_i16.ll

@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 < %s | FileCheck %s


I would hope we wouldn't need a new test for this. It looks like the abs tests could use some gardening. test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll looks reasonably comprehensive; can you start by precommitting moving that up to the main directory, and adding dag run lines? It then should gain the negated abs cases

(I also noticed the test is minorly broken, fixed by #165965)

llvm/lib/Target/AMDGPU/SISAbs16Fixup.cpp

linuxrocks123 · 2025-11-01T05:28:01Z

This doesn't really fit with the usual strategy for dealing with this sort of issue (it's sort of backwards). The usual strategy is something closer to treat the operation as legal, and deal with it later if it isn't.

There are a few options. What I suggest is to make ISD::ABS custom for i16. In the custom lowering, check if the input is divergent, return to use the default expansion.

That should get the right result most of the time. There's a possible edge case where SIFixVGPRCopies will need to deal s_abs_i16 with SGPR inputs

@arsenm have you looked at b83ae56? I think that earlier failed attempt is close to what you were suggesting. Perhaps that earlier method could work better if I checked for whether the input is divergent as you suggested, so how would I go about checking whether the input is divergent?

linuxrocks123 · 2025-11-01T05:29:53Z

@arsenm also am I making the lowering "custom for i16" properly in that commit, or is there another way? It bugged me that I had to make that function virtual to override it since I thought, if this was the right way to go, some other target probably would have made it virtual by now.

arsenm · 2025-11-01T06:00:04Z

so how would I go about checking whether the input is divergent?

Like this

@arsenm also am I making the lowering "custom for i16" properly in that commit, or is there another way? It bugged me that I had to make that function virtual to override it since I thought, if this was the right way to go, some other target probably would have made it virtual by now.

No, you shouldn't need to make anything virtual. You should call setOperationAction(ISD::ABS, MVT::i16, Legal) under Subtarget->has16BitInsts() in the SITargetLowering constructor. In the handling in LowerOperation, you can defer to the default expansion (by returning either SDValue(), or the original SDValue). One of them is the take as directly legal, the other is take as use default expansion but I can never remember which is which

linuxrocks123 · 2025-11-03T20:26:06Z

@arsenm why SITargetLowering instead of AMDGPUTargetLowering?

arsenm · 2025-11-03T21:05:09Z

@arsenm why SITargetLowering instead of AMDGPUTargetLowering?

R600 doesn't need this, AMDGPUTargetLowering is generally for shared stuff

jayfoad · 2025-11-04T10:03:38Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

+  assert(Op.getValueType() == MVT::i16 &&
+         "Tried to select abs i16 lowering with non-i16 type.");
+
+  // divergent means will not end up using SGPRs


Nit: comments should generally be full sentences starting with a capital and ending with a period, here and elsewhere. See coding standards.

jayfoad · 2025-11-04T10:05:59Z

Do you have examples showing that the codegen is actually better with this patch than it was before?

linuxrocks123 · 2025-11-04T16:51:00Z

@jayfoad if you mean did I search real code for examples, no. If you mean did I construct examples, yes. Running the testcase with main versus branch shows the improvement.

jayfoad · 2025-11-04T17:35:01Z

Running the testcase with main versus branch shows the improvement.

In that situation it's helpful to push two commits to the same PR, the first one just adding the test and showing the old codegen, and the second one adding the real fix and showing the improved codegen. So that all your reviewers don't have to build two compilers themselves just to see the effect of your patch.

linuxrocks123 · 2025-11-04T21:21:00Z

@jayfoad thanks, I'll keep that in mind for the future.

LU-JOHN · 2025-11-05T15:29:04Z

This might be a separate PR, but can you check if abs(i8) is handled well.

linuxrocks123 · 2025-11-05T17:58:45Z

@LU-JOHN thanks I will check that and make another PR if needed.

linuxrocks123 · 2025-11-05T17:58:56Z

@arsenm LGTM?

linuxrocks123 · 2025-11-05T22:17:02Z

@arsenm @LU-JOHN per @LU-JOHN's suggestion I updated this PR to handle i8 absolute values as well. Unfortunately, this required a return to the approach of overriding expandABS.

Currently, I am overriding the function in AMDGPUTargetLowering. It may be more appropriate to override in SITargetLowering instead. Please let me know.

The reason is that MVT::i8 is not a valid type on our architecture, so setting a custom operation action no longer works. If either of you has a better idea than overriding expandABS, please let me know. If not, I believe this is ready for merge, possibly modulo the class in which to override expandABS.

jayfoad · 2025-11-06T10:24:33Z

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

+  if (!IsNegative)
+    return TruncResult;
+
+  return DAG.getNode(ISD::SUB, DL, N->getValueType(0),


Use getNegative

jayfoad · 2025-11-06T10:24:43Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

 #include "llvm/IR/IntrinsicsR600.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Compiler.h"


Remove this.

I removed it. Note that clang-format or clang-tidy or whatever added it, not me.

RKSimon

I'm really not keen on overriding expansions like this. Have you tried adding a local combineABS to just extend anything smaller than i32?

linuxrocks123 · 2025-11-06T20:47:24Z

@RKSimon you can examine my previous attempt using setOperationAction at f02cfeb. This attempt failed because MVT::i8 is not a legal type for our target, and isOperationLegalOrCustom will only return true when the operation is legal/custom and the type is legal. See TargetLowering.h:1364.

The target-dependent PerformDAGCombine hook would likewise be too late since the target-independent combines run first. See DAGCombiner.cpp:2085.

arsenm · 2025-11-06T23:44:39Z

@arsenm @LU-JOHN per @LU-JOHN's suggestion I updated this PR to handle i8 absolute values as well. Unfortunately, this required a return to the approach of overriding expandABS.

Don't expand the scope of the patch to cover i8, leave that for later. That shouldn't require overriding anything either, you could handle i8 the same way and mix in the promotion

The reason is that MVT::i8 is not a valid type on our architecture, so setting a custom operation action no longer works.

It does work, it just goes through ReplaceNodeResults instead of LowerOperation

arsenm · 2025-11-06T23:49:40Z

The reason is that MVT::i8 is not a valid type on our architecture, so setting a custom operation action no longer works.

It does work, it just goes through ReplaceNodeResults instead of LowerOperation

It also double complicates the vector of i8 case, but I'd expect you could just rely on i8 being legalized to i16 and not have to worry about either

linuxrocks123 · 2025-11-07T00:23:45Z

@arsenm @RKSimon I tried it prior to reverting to the expandABS approach. It did not work. Would you like to see a separate experimental PR with the failing code?

I don't have a strong opinion on whether this PR should handle i16 and i8 or simply i8. However, they should be handled in the same manner, so we need to proceed in a way that is viable for both types.

linuxrocks123 · 2025-11-07T20:53:15Z

@arsenm @RKSimon I'm trying with the ReplaceNodeResults hook now.

linuxrocks123 · 2025-11-08T01:01:56Z

@arsenm @RKSimon please see #167064 for a failed attempt at using the suggested alternatives to solve this issue.

llvmbot added the backend:AMDGPU label Oct 29, 2025

jmmartinez reviewed Oct 30, 2025

View reviewed changes

LU-JOHN changed the title ~~Improved AMDGPU Lowering of abs(i16) and -abs(i16)~~ [AMDGPU] Improved Lowering of abs(i16) and -abs(i16) Oct 30, 2025

arsenm reviewed Nov 1, 2025

View reviewed changes

jayfoad reviewed Nov 4, 2025

View reviewed changes

linuxrocks123 force-pushed the swdev-562122 branch from 81223ab to 55c3456 Compare November 4, 2025 22:16

jayfoad reviewed Nov 6, 2025

View reviewed changes

jayfoad requested a review from RKSimon November 6, 2025 10:25

linuxrocks123 force-pushed the swdev-562122 branch from fae5af6 to afd852d Compare November 6, 2025 16:35

linuxrocks123 and others added 20 commits November 6, 2025 13:14

This doesn't work.

eb489a6

Finally something that works

3fbeb0d

This doesn't work.

6f630bd

Revert to master

1136040

Machine-Level Implementation

02e54ce

Add new file

eafd0ce

Run update_llc_test_checks.py

d74b1f2

Attempt llvm#4, with DAG again

f02cfeb

For real?

f92fac1

Add testcase

43b7fa2

Delete new testcase

348729a

Fix testcase

2f2affd

Update testcase

d0a4fa4

i8

259743d

Update i8 testcase

dbb2981

Review changes

171da1f

Delete testcase

13755b4

Move file

0d9aa11

Restore testcase

2a1c2cc

clang-format

4a09913

linuxrocks123 force-pushed the swdev-562122 branch from afd852d to 4a09913 Compare November 6, 2025 18:15

RKSimon reviewed Nov 6, 2025

View reviewed changes

linuxrocks123 mentioned this pull request Nov 8, 2025

[AMDGPU] [DO NOT MERGE] Nonsuccessful Attempt At Using SelectionDAG Hooks for abs i8/i16 #167064

Draft

		@@ -0,0 +1,22 @@
		; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 < %s \| FileCheck %s

		@@ -0,0 +1,26 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
		; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 < %s \| FileCheck %s

[AMDGPU] Improved Lowering of abs(i8/i16) and -abs(i8/i16) #165626

Are you sure you want to change the base?

[AMDGPU] Improved Lowering of abs(i8/i16) and -abs(i8/i16) #165626

Conversation

linuxrocks123 commented Oct 29, 2025

Uh oh!

llvmbot commented Oct 29, 2025

Uh oh!

jayfoad commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linuxrocks123 commented Oct 30, 2025

Uh oh!

jmmartinez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linuxrocks123 commented Oct 30, 2025

Uh oh!

jmmartinez commented Oct 31, 2025

Uh oh!

linuxrocks123 commented Oct 31, 2025

Uh oh!

linuxrocks123 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linuxrocks123 commented Oct 31, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

linuxrocks123 commented Nov 1, 2025

Uh oh!

linuxrocks123 commented Nov 1, 2025

Uh oh!

arsenm commented Nov 1, 2025

Uh oh!

linuxrocks123 commented Nov 3, 2025

Uh oh!

arsenm commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Nov 4, 2025

Uh oh!

linuxrocks123 commented Nov 4, 2025

Uh oh!

jayfoad commented Nov 4, 2025

Uh oh!

linuxrocks123 commented Nov 4, 2025

Uh oh!

LU-JOHN commented Nov 5, 2025

Uh oh!

linuxrocks123 commented Nov 5, 2025

Uh oh!

linuxrocks123 commented Nov 5, 2025

Uh oh!

linuxrocks123 commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 30, 2025 •

edited

Loading

linuxrocks123 commented Oct 31, 2025 •

edited

Loading

linuxrocks123 commented Nov 6, 2025 •

edited

Loading