-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LLVM][CodeGen][SVE] ASRD cannot represent sdiv-by-one. #162708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We lower signed divides by a power-of-two to ASRD. However, ASRD's immediate is log2(shift_amount) in the range 1 to elt-bitwidth, which means it cannot represent sdiv-by-one.
@llvm/pr-subscribers-backend-aarch64 Author: Paul Walker (paulwalker-arm) ChangesWe lower signed divides by a power-of-two to ASRD. However, ASRD's immediate is log2(shift_amount) in the range 1 to elt-bitwidth, which means it cannot represent sdiv-by-one. We shouldn't need to handle the sdiv-by-one case during lowering because it's trivially foldable. I've created #162706 to fix this, which is why this PR doesn't attempt to do anything smart and simply lowers the divide like any other. Fixes #162616 Full diff: https://github.com/llvm/llvm-project/pull/162708.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index dc8e7c84f5e2c..ad7408e37f6bb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -16248,7 +16248,9 @@ SDValue AArch64TargetLowering::LowerDIV(SDValue Op, SelectionDAG &DAG) const {
bool Negated;
uint64_t SplatVal;
- if (Signed && isPow2Splat(Op.getOperand(1), SplatVal, Negated)) {
+ // NOTE: SRAD cannot be used to represent sdiv-by-one.
+ if (Signed && isPow2Splat(Op.getOperand(1), SplatVal, Negated) &&
+ SplatVal > 1) {
SDValue Pg = getPredicateForScalableVector(DAG, DL, VT);
SDValue Res =
DAG.getNode(AArch64ISD::SRAD_MERGE_OP1, DL, VT, Pg, Op->getOperand(0),
@@ -30033,7 +30035,9 @@ SDValue AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE(
bool Negated;
uint64_t SplatVal;
- if (Signed && isPow2Splat(Op.getOperand(1), SplatVal, Negated)) {
+ // NOTE: SRAD cannot be used to represent sdiv-by-one.
+ if (Signed && isPow2Splat(Op.getOperand(1), SplatVal, Negated) &&
+ SplatVal > 1) {
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
SDValue Op1 = convertToScalableVector(DAG, ContainerVT, Op.getOperand(0));
SDValue Op2 = DAG.getTargetConstant(Log2_64(SplatVal), DL, MVT::i32);
diff --git a/llvm/test/CodeGen/AArch64/sve-asrd.ll b/llvm/test/CodeGen/AArch64/sve-asrd.ll
new file mode 100644
index 0000000000000..66db1a5dc1dbf
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-asrd.ll
@@ -0,0 +1,53 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mattr=+sve -combiner-disabled < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+; Ensure we don't try to represent sdiv-by-one using ARSD.
+define <16 x i16> @sdiv_by_one_v16i16(<16 x i16> %a) vscale_range(2,2) {
+; CHECK-LABEL: sdiv_by_one_v16i16:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.h
+; CHECK-NEXT: adrp x8, .LCPI0_0
+; CHECK-NEXT: add x8, x8, :lo12:.LCPI0_0
+; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
+; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
+; CHECK-NEXT: ld1h { z2.h }, p0/z, [x8]
+; CHECK-NEXT: sunpklo z0.s, z0.h
+; CHECK-NEXT: sunpklo z1.s, z1.h
+; CHECK-NEXT: ptrue p0.s
+; CHECK-NEXT: sunpklo z3.s, z2.h
+; CHECK-NEXT: ext z2.b, z2.b, z2.b, #16
+; CHECK-NEXT: sunpklo z2.s, z2.h
+; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z3.s
+; CHECK-NEXT: sdiv z1.s, p0/m, z1.s, z2.s
+; CHECK-NEXT: ptrue p0.h, vl8
+; CHECK-NEXT: uzp1 z0.h, z0.h, z0.h
+; CHECK-NEXT: uzp1 z1.h, z1.h, z1.h
+; CHECK-NEXT: splice z0.h, p0, z0.h, z1.h
+; CHECK-NEXT: movprfx z1, z0
+; CHECK-NEXT: ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
+; CHECK-NEXT: // kill: def $q1 killed $q1 killed $z1
+; CHECK-NEXT: ret
+ %res = sdiv <16 x i16> %a, splat(i16 1)
+ ret <16 x i16> %res
+}
+
+; Ensure we don't try to represent sdiv-by-one using ARSD.
+define <vscale x 8 x i16> @sdiv_by_one_nxv8i16(<vscale x 8 x i16> %a) {
+; CHECK-LABEL: sdiv_by_one_nxv8i16:
+; CHECK: // %bb.0:
+; CHECK-NEXT: mov z1.h, #1 // =0x1
+; CHECK-NEXT: sunpkhi z2.s, z0.h
+; CHECK-NEXT: sunpklo z0.s, z0.h
+; CHECK-NEXT: ptrue p0.s
+; CHECK-NEXT: sunpkhi z3.s, z1.h
+; CHECK-NEXT: sunpklo z1.s, z1.h
+; CHECK-NEXT: sdiv z2.s, p0/m, z2.s, z3.s
+; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
+; CHECK-NEXT: ret
+ %res = sdiv <vscale x 8 x i16> %a, splat(i16 1)
+ ret <vscale x 8 x i16> %res
+}
|
bool Negated; | ||
uint64_t SplatVal; | ||
if (Signed && isPow2Splat(Op.getOperand(1), SplatVal, Negated)) { | ||
// NOTE: SRAD cannot be used to represent sdiv-by-one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to follow up with an NFC patch to rename the SRAD node to match the instruction it represents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick fix!
Thanks for the fix, lgtm too. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/23836 Here is the relevant piece of the build log for the reference
|
We lower signed divides by a power-of-two to ASRD. However, ASRD's immediate is log2(shift_amount) in the range 1 to elt-bitwidth, which means it cannot represent sdiv-by-one.
We shouldn't need to handle the sdiv-by-one case during lowering because it's trivially foldable. I've created #162706 to fix this, which is why this PR doesn't attempt to do anything smart and simply lowers the divide like any other.
Fixes #162616