[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits #152273

houngkoungting · 2025-08-06T09:00:24Z

avgceil version : https://alive2.llvm.org/ce/z/2CKrRh
Fix #147773 , After several iterations, I believe this version is correct and complete.

@RKSimon

…ufficient leading zero/sign bits-1

llvmbot · 2025-08-06T09:00:54Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: 黃國庭 (houngkoungting)

Changes

avgceil version : https://alive2.llvm.org/ce/z/2CKrRh
Fix #147773 , After several iterations, I believe this version is correct and complete.

@RKSimon

Full diff: https://github.com/llvm/llvm-project/pull/152273.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+45)
(added) llvm/test/CodeGen/AArch64/trunc-avg-fold.ll (+43)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d70e96938ed9a..9ff256f8090ba 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16294,6 +16294,51 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
   // because targets may prefer a wider type during later combines and invert
   // this transform.
   switch (N0.getOpcode()) {
+  case ISD::AVGCEILU:
+  case ISD::AVGFLOORU:
+    if (!LegalOperations && N0.hasOneUse() &&
+        TLI.isOperationLegal(N0.getOpcode(), VT)) {
+      SDValue X = N0.getOperand(0);
+      SDValue Y = N0.getOperand(1);
+
+      KnownBits KnownX = DAG.computeKnownBits(X);
+      KnownBits KnownY = DAG.computeKnownBits(Y);
+
+      unsigned SrcBits = X.getScalarValueSizeInBits();
+      unsigned DstBits = VT.getScalarSizeInBits();
+      unsigned NeededLeadingZeros = SrcBits - DstBits + 1;
+
+      if (KnownX.countMinLeadingZeros() >= NeededLeadingZeros &&
+          KnownY.countMinLeadingZeros() >= NeededLeadingZeros) {
+        SDValue Tx = DAG.getNode(ISD::TRUNCATE, DL, VT, X);
+        SDValue Ty = DAG.getNode(ISD::TRUNCATE, DL, VT, Y);
+        return DAG.getNode(N0.getOpcode(), DL, VT, Tx, Ty);
+      }
+    }
+    break;
+
+  case ISD::AVGCEILS:
+  case ISD::AVGFLOORS:
+    if (!LegalOperations && N0.hasOneUse() &&
+        TLI.isOperationLegal(N0.getOpcode(), VT)) {
+      SDValue X = N0.getOperand(0);
+      SDValue Y = N0.getOperand(1);
+
+      unsigned SignBitsX = DAG.ComputeNumSignBits(X);
+      unsigned SignBitsY = DAG.ComputeNumSignBits(Y);
+
+      unsigned SrcBits = X.getScalarValueSizeInBits();
+      unsigned DstBits = VT.getScalarSizeInBits();
+      unsigned NeededSignBits = SrcBits - DstBits + 1;
+
+      if (SignBitsX >= NeededSignBits && SignBitsY >= NeededSignBits) {
+        SDValue Tx = DAG.getNode(ISD::TRUNCATE, DL, VT, X);
+        SDValue Ty = DAG.getNode(ISD::TRUNCATE, DL, VT, Y);
+        return DAG.getNode(N0.getOpcode(), DL, VT, Tx, Ty);
+      }
+    }
+    break;
+
   case ISD::ADD:
   case ISD::SUB:
   case ISD::MUL:
diff --git a/llvm/test/CodeGen/AArch64/trunc-avg-fold.ll b/llvm/test/CodeGen/AArch64/trunc-avg-fold.ll
new file mode 100644
index 0000000000000..175f54d6f9c05
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/trunc-avg-fold.ll
@@ -0,0 +1,43 @@
+; RUN: llc -mtriple=aarch64-- -O2 -mattr=+neon < %s | FileCheck %s
+
+; CHECK-LABEL: test_avgceil_u
+; CHECK: uhadd v0.8b, v0.8b, v1.8b
+define <8 x i8> @test_avgceil_u(<8 x i16> %a, <8 x i16> %b) {
+  %ta = trunc <8 x i16> %a to <8 x i8>
+  %tb = trunc <8 x i16> %b to <8 x i8>
+  %res = call <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8> %ta, <8 x i8> %tb)
+  ret <8 x i8> %res
+}
+
+; CHECK-LABEL: test_avgceil_s
+; CHECK: shadd v0.8b, v0.8b, v1.8b
+define <8 x i8> @test_avgceil_s(<8 x i16> %a, <8 x i16> %b) {
+  %ta = trunc <8 x i16> %a to <8 x i8>
+  %tb = trunc <8 x i16> %b to <8 x i8>
+  %res = call <8 x i8> @llvm.aarch64.neon.shadd.v8i8(<8 x i8> %ta, <8 x i8> %tb)
+  ret <8 x i8> %res
+}
+
+; CHECK-LABEL: test_avgfloor_u
+; CHECK: urhadd v0.8b, v0.8b, v1.8b
+define <8 x i8> @test_avgfloor_u(<8 x i16> %a, <8 x i16> %b) {
+  %ta = trunc <8 x i16> %a to <8 x i8>
+  %tb = trunc <8 x i16> %b to <8 x i8>
+  %res = call <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8> %ta, <8 x i8> %tb)
+  ret <8 x i8> %res
+}
+
+; CHECK-LABEL: test_avgfloor_s
+; CHECK: srhadd v0.8b, v0.8b, v1.8b
+define <8 x i8> @test_avgfloor_s(<8 x i16> %a, <8 x i16> %b) {
+  %ta = trunc <8 x i16> %a to <8 x i8>
+  %tb = trunc <8 x i16> %b to <8 x i8>
+  %res = call <8 x i8> @llvm.aarch64.neon.srhadd.v8i8(<8 x i8> %ta, <8 x i8> %tb)
+  ret <8 x i8> %res
+}
+
+declare <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8>, <8 x i8>)
+declare <8 x i8> @llvm.aarch64.neon.shadd.v8i8(<8 x i8>, <8 x i8>)
+declare <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8>, <8 x i8>)
+declare <8 x i8> @llvm.aarch64.neon.srhadd.v8i8(<8 x i8>, <8 x i8>)
+

RKSimon · 2025-08-06T13:36:48Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+
+      unsigned SrcBits = X.getScalarValueSizeInBits();
+      unsigned DstBits = VT.getScalarSizeInBits();
+      unsigned NeededLeadingZeros = SrcBits - DstBits + 1;


NeededLeadingZeros = SrcBits - DstBits; ? (NeededSignBits is correct though you could use ComputeMaxSignificantBits instead if you wish)

Sorry I think you misunderstood - you need to use computeKnownBits.countMinLeadingZeros() >= (SrcBits - DstBits)

llvm/test/CodeGen/AArch64/trunc-avg-fold.ll

…ufficient leading zero/sign bits -2

github-actions · 2025-08-07T16:08:03Z

✅ With the latest revision this PR passed the undef deprecator.

houngkoungting · 2025-08-07T16:21:26Z

I will fix it tomorrow

RKSimon · 2025-08-07T16:33:20Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+
+      unsigned SrcBits = X.getScalarValueSizeInBits();
+      unsigned DstBits = VT.getScalarSizeInBits();
+      unsigned NeededLeadingZeros = SrcBits - DstBits + 1;


Sorry I think you misunderstood - you need to use computeKnownBits.countMinLeadingZeros() >= (SrcBits - DstBits)

…ufficient leading zero/sign bits -3

…ufficient leading zero/sign bits-4

…ufficient leading zero/sign bits-5

…ufficient leading zero/sign bits-6

RKSimon · 2025-08-08T08:27:47Z

llvm/test/CodeGen/AArch64/trunc-avg-fold.ll

+  %mask = insertelement <8 x i16> poison, i16 255, i32 0
+  %mask.splat = shufflevector <8 x i16> %mask, <8 x i16> poison, <8 x i32> zeroinitializer
+  %ta16 = and <8 x i16> %a, %mask.splat
+  %tb16 = and <8 x i16> %b, %mask.splat


why not use splat (i16 255)? it was added to avoid the messy shufflevector(insertelement) pattern

…ufficient leading zero/sign bits-7

houngkoungting · 2025-08-12T14:01:53Z

HI @RKSimon , did I get this right?

RKSimon · 2025-08-12T14:44:16Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+        return DAG.getNode(N0.getOpcode(), DL, VT, Tx, Ty);
+      }
+    }
+    break;


You should be able to reuse the ISD::ABD code later in the switch statement now - its has near-identical logic

llvm/test/CodeGen/AArch64/trunc-avg-fold.ll

…ufficient leading zero/sign bits-8

…ufficient leading zero/sign bits-9

houngkoungting · 2025-08-16T16:13:13Z

HI @RKSimon , I update the test cases first; I’ll modify the DAG code tomorrow.

…ent leading zero/sign bits-10

jayfoad · 2025-08-18T08:36:34Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+      if (KnownX.countMinLeadingZeros() >= (SrcBits - DstBits) &&
+          KnownY.countMinLeadingZeros() >= (SrcBits - DstBits)) {


Suggested change

if (KnownX.countMinLeadingZeros() >= (SrcBits - DstBits) &&

KnownY.countMinLeadingZeros() >= (SrcBits - DstBits)) {

if (KnownX.countMaxActiveBits() <= DstBits &&

KnownY.countMaxActiveBits() <= DstBits) {

Then you don't need SrcBits.

You could even do this:

APInt UpperBits = APInt::getHighBitsSet(SrcBits, SrcBits - DstBits); if (DAG.MaskedValueIsZero(X, UpperBits) && DAG.MaskedValueIsZero(Y, UpperBits)) {

Or APInt::getBitsSetFrom(SrcBits, DstBits). (Sometimes I think we have too many different helper functions!)

jayfoad · 2025-08-18T08:40:22Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+      KnownBits KnownX = DAG.computeKnownBits(X);
+      KnownBits KnownY = DAG.computeKnownBits(Y);


computeKnownBits can be expensive. You should rearrange this code so that you only call computeKnownBits(Y) if the test on the result of computeKnownBits(X) succeeds.

RKSimon

couple of suggestions to reduce value tracking costs

RKSimon · 2025-08-18T09:47:29Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+      unsigned DstBits = VT.getScalarSizeInBits();
+      unsigned NeededSignBits = SrcBits - DstBits + 1;
+
+      if (SignBitsX >= NeededSignBits && SignBitsY >= NeededSignBits) {


if (DAG.ComputeNumSignBits(X) >= NeededSignBits && DAG.ComputeNumSignBits(Y) >= NeededSignBits) {

RKSimon · 2025-08-18T09:49:08Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+      if (KnownX.countMinLeadingZeros() >= (SrcBits - DstBits) &&
+          KnownY.countMinLeadingZeros() >= (SrcBits - DstBits)) {


You could even do this:

APInt UpperBits = APInt::getHighBitsSet(SrcBits, SrcBits - DstBits); if (DAG.MaskedValueIsZero(X, UpperBits) && DAG.MaskedValueIsZero(Y, UpperBits)) {

…ufficient leading zero/sign bits-11

RKSimon

LGTM - cheers

houngkoungting · 2025-08-18T14:26:10Z

@RKSimon @jayfoad Thank you both for your review

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

80e303c

…ufficient leading zero/sign bits-1

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Aug 6, 2025

houngkoungting changed the title ~~[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…~~ [DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits Aug 6, 2025

RKSimon reviewed Aug 6, 2025

View reviewed changes

RKSimon requested review from davemgreen and jayfoad August 6, 2025 13:55

RKSimon mentioned this pull request Aug 7, 2025

[DAG] Fold trunc(abdu(x,y)) and trunc(abds(x,y)) if they have sufficient leading zero/sign bits #151471

Merged

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

24287f7

…ufficient leading zero/sign bits -2

RKSimon requested changes Aug 7, 2025

View reviewed changes

houngkoungting added 4 commits August 8, 2025 10:51

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

c8cc2a9

…ufficient leading zero/sign bits -3

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

1115256

…ufficient leading zero/sign bits-4

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

08138a2

…ufficient leading zero/sign bits-5

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

728b37d

…ufficient leading zero/sign bits-6

RKSimon reviewed Aug 8, 2025

View reviewed changes

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

44609a3

…ufficient leading zero/sign bits-7

RKSimon reviewed Aug 12, 2025

View reviewed changes

houngkoungting added 2 commits August 17, 2025 00:09

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

2d268fc

…ufficient leading zero/sign bits-8

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

32041fb

…ufficient leading zero/sign bits-9

Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have suffici…

4e1af14

…ent leading zero/sign bits-10

jayfoad reviewed Aug 18, 2025

View reviewed changes

RKSimon reviewed Aug 18, 2025

View reviewed changes

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have s…

c4ea7bd

…ufficient leading zero/sign bits-11

RKSimon approved these changes Aug 18, 2025

View reviewed changes

Merge branch 'main' into main

6f84361

RKSimon merged commit 0773854 into llvm:main Aug 18, 2025
8 of 9 checks passed

		if (KnownX.countMinLeadingZeros() >= (SrcBits - DstBits) &&
		KnownY.countMinLeadingZeros() >= (SrcBits - DstBits)) {

		KnownBits KnownX = DAG.computeKnownBits(X);
		KnownBits KnownY = DAG.computeKnownBits(Y);

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits #152273

[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits #152273

Uh oh!

Conversation

houngkoungting commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

houngkoungting commented Aug 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

houngkoungting commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

houngkoungting commented Aug 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

houngkoungting commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Aug 6, 2025 •

edited

Loading

github-actions bot commented Aug 7, 2025 •

edited

Loading

houngkoungting commented Aug 18, 2025 •

edited

Loading