Skip to content

Conversation

@tschuett
Copy link

@tschuett tschuett commented Nov 4, 2024

add patterns to unpredicate instructions

FNEG_ZPmZ and FABS_ZPmZ are merging predicated instructions.

See https://discourse.llvm.org/t/pat-s-with-destinations/82918

@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Thorsten Schütt (tschuett)

Changes

add patterns to unprediate instructions

FNEG_ZPmZ and FABS_ZPmZ are merging predicated instructions.

See https://discourse.llvm.org/t/pat-s-with-destinations/82918


Full diff: https://github.com/llvm/llvm-project/pull/114784.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64.td (+7)
  • (added) llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td (+36)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+1-1)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+1)
  • (added) llvm/test/CodeGen/AArch64/sve-float.ll (+71)
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index 6854cccaafa1d7..9bbd2dd0721130 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -179,3 +179,10 @@ def AArch64 : Target {
 //===----------------------------------------------------------------------===//
 
 include "AArch64PfmCounters.td"
+
+
+//===----------------------------------------------------------------------===//
+// GlobalISel patterns
+//===----------------------------------------------------------------------===//
+
+include "AArch64GlobalISelPatterns.td"
diff --git a/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td b/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td
new file mode 100644
index 00000000000000..f4424ed3499ab6
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td
@@ -0,0 +1,36 @@
+//===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Selection and combine patterns for GlobalISel.
+//
+//===----------------------------------------------------------------------===//
+
+
+//unpredicate patterns
+
+
+
+// fneg
+def : Pat<(nxv2f64 (fneg nxv2f64:$src)),
+          (FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
+
+def : Pat<(nxv4f32 (fneg nxv4f32:$src)),
+          (FNEG_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;
+
+def : Pat<(nxv8f16 (fneg nxv8f16:$src)),
+          (FNEG_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;
+
+// fabs
+def : Pat<(nxv2f64 (fabs nxv2f64:$src)),
+          (FABS_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
+
+def : Pat<(nxv4f32 (fabs nxv4f32:$src)),
+          (FABS_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;
+
+def : Pat<(nxv8f16 (fabs nxv8f16:$src)),
+          (FABS_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 6fdcaec86340ce..1871c536b003be 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -664,7 +664,7 @@ let Predicates = [HasSVEorSME] in {
   defm FABS_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b100, "fabs", AArch64fabs_mt>;
   defm FNEG_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b101, "fneg", AArch64fneg_mt>;
 
-  foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
+  foreach VT = [nxv2bf16, nxv4bf16] in {
     // No dedicated instruction, so just clear the sign bit.
     def : Pat<(VT (fabs VT:$op)),
               (AND_ZI $op, (i64 (logical_imm64_XFORM(i64 0x7fff7fff7fff7fff))))>;
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index f162d1c2973cbc..a6743404520774 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -279,6 +279,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
   getActionDefinitionsBuilder({G_FABS, G_FNEG})
       .legalFor({s32, s64, v2s32, v4s32, v2s64})
       .legalFor(HasFP16, {s16, v4s16, v8s16})
+      .legalFor(HasSVE, {nxv2s64, nxv4s32, nxv8s16})
       .scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
       .lowerIf(scalarOrEltWiderThan(0, 64))
       .clampNumElements(0, v4s16, v8s16)
diff --git a/llvm/test/CodeGen/AArch64/sve-float.ll b/llvm/test/CodeGen/AArch64/sve-float.ll
new file mode 100644
index 00000000000000..186d4e39226dda
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-float.ll
@@ -0,0 +1,71 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc < %s -mtriple aarch64 -mattr=+sve  | FileCheck %s
+; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel -aarch64-enable-gisel-sve=1 | FileCheck %s
+
+;; fneg
+define <vscale x 2 x double> @fnegnxv2double(<vscale x 2 x double> %a) {
+; CHECK-LABEL: fnegnxv2double:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fneg z0.d, p0/m, z0.d
+; CHECK-NEXT:    ret
+entry:
+  %c = fneg <vscale x 2 x double> %a
+  ret <vscale x 2 x double> %c
+}
+
+define <vscale x 4 x float> @fnegnxv4float(<vscale x 4 x float> %a) {
+; CHECK-LABEL: fnegnxv4float:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fneg z0.s, p0/m, z0.s
+; CHECK-NEXT:    ret
+entry:
+  %c = fneg <vscale x 4 x float> %a
+  ret <vscale x 4 x float> %c
+}
+
+define <vscale x 8 x half> @fnegnxv8half(<vscale x 8 x half> %a) {
+; CHECK-LABEL: fnegnxv8half:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    fneg z0.h, p0/m, z0.h
+; CHECK-NEXT:    ret
+entry:
+  %c = fneg <vscale x 8 x half> %a
+  ret <vscale x 8 x half> %c
+}
+
+;; fabs
+define <vscale x 2 x double> @fabsnxv2double(<vscale x 2 x double> %a) {
+; CHECK-LABEL: fabsnxv2double:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fabs z0.d, p0/m, z0.d
+; CHECK-NEXT:    ret
+entry:
+  %c = tail call <vscale x 2 x double> @llvm.fabs.nxv2f64(<vscale x 2 x double> %a)
+  ret <vscale x 2 x double> %c
+}
+
+define <vscale x 4 x float> @fabsnxv4float(<vscale x 4 x float> %a) {
+; CHECK-LABEL: fabsnxv4float:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fabs z0.s, p0/m, z0.s
+; CHECK-NEXT:    ret
+entry:
+  %c = tail call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> %a)
+  ret <vscale x 4 x float> %c
+}
+
+define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
+; CHECK-LABEL: fabsnxv8half:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    fabs z0.h, p0/m, z0.h
+; CHECK-NEXT:    ret
+entry:
+   %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
+  ret <vscale x 8 x half> %c
+}


// fneg
def : Pat<(nxv2f64 (fneg nxv2f64:$src)),
(FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
Copy link
Collaborator

@paulwalker-arm paulwalker-arm Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of IMPLICIT_DEF here can introduce false data-dependcies that will negatively impact performance. We go to some trouble to avoid this in isel (see https://reviews.llvm.org/D105889) whereby when the "real" input is not reused we emit a movprfx to break the dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ag movprfx llvm/lib/Target/AArch64/*.td movprfx seems to be seldomly used in patterns. Could you give a pointer?

Copy link
Author

@tschuett tschuett Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have access to the destination register neither in nor out in Pats. It might be different in C++.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe SelectionDAG uses FNEG_ZPmZ_S_UNDEF when all lanes are active?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found #116259, but that looks like SVE2.2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEF
pseudo instruction?
from lib/Target/AArch64/AArch64GenInstrInfo.inc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s

define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
  ; CHECK-LABEL: name: fabsnxv8half
  ; CHECK: bb.1.entry:
  ; CHECK-NEXT:   liveins: $z0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:zpr = COPY $z0
  ; CHECK-NEXT:   [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
  ; CHECK-NEXT:   [[FABS_ZPmZ_H:%[0-9]+]]:zpr = FABS_ZPmZ_H [[DEF]], [[PTRUE_H]], [[COPY]]
  ; CHECK-NEXT:   $z0 = COPY [[FABS_ZPmZ_H]]
  ; CHECK-NEXT:   RET_ReallyLR implicit $z0
entry:
   %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
  ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}

Almost the difference is the undef. It is not in the td pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEF pseudo instruction? from lib/Target/AArch64/AArch64GenInstrInfo.inc

Yes it is pseudo instruction expanded by the code added by https://reviews.llvm.org/D105889 that was shared earlier in this conversation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s

define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
  ; CHECK-LABEL: name: fabsnxv8half
  ; CHECK: bb.1.entry:
  ; CHECK-NEXT:   liveins: $z0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:zpr = COPY $z0
  ; CHECK-NEXT:   [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
  ; CHECK-NEXT:   [[FABS_ZPmZ_H_UNDEF:%[0-9]+]]:zpr = FABS_ZPmZ_H_UNDEF [[DEF]], [[PTRUE_H]], [[COPY]]
  ; CHECK-NEXT:   $z0 = COPY [[FABS_ZPmZ_H_UNDEF]]
  ; CHECK-NEXT:   RET_ReallyLR implicit $z0
entry:
   %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
  ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}

I can get the _UNDEF, but I don't get the killed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about the kill flag.

@@ -0,0 +1,36 @@
//===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that much about GlobalISel, but I thought we could reuse a lot of the existing patterns. Do we really need to duplicate all this functionality?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't' want to duplicate patterns. We re-use a lot of the existing ones. There are just some patterns missing, e.g., fneg and fabs.

Copy link
Author

@tschuett tschuett Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patterns like ,e.g, extract_subvector work great for us. We have issues with predicated instructions. I want to add patterns that unpredicate instructions.

%c = fneg <vscale x 2 x double> %a
ret <vscale x 2 x double> %c
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 3 x case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion failed: ((TypeSize::ScalarTy)SrcOps.size() * SrcOps[0].getLLTTy(*getMRI()).getSizeInBits() == DstOps[0].getLLTTy(*getMRI()).getSizeInBits() && "input vectors do not exactly cover the output vector register"), function buildInstr, file MachineIRBuilder.cpp, line 1419.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM ERROR: Unable to widen vector store
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: llc -mtriple aarch64 -mattr=+sve -global-isel=0

  1. Running pass 'Function Pass Manager' on module ''.
  2. Running pass 'AArch64 Instruction Selection' on function '@fnegnxv3double'
    Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it):
    0 llc 0x000000010c171388 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
    1 llc 0x000000010c16eadb llvm::sys::RunSignalHandlers() + 235
    2 llc 0x000000010c171b56 SignalHandler(int) + 278
    3 libsystem_platform.dylib 0x00007ff8161aee9d _sigtramp + 29
    4 libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370210070912
    5 libsystem_c.dylib 0x00007ff81609ab19 abort + 126
    6 llc 0x000000010c0b06aa llvm::report_fatal_error(llvm::Twine const&, bool) + 458
    7 llc 0x000000010c0b04d9 llvm::report_fatal_error(char const*, bool) + 41
    8 llc 0x000000010be82561 llvm::DAGTypeLegalizer::WidenVecOp_STORE(llvm::SDNode*) + 6465
    9 llc 0x000000010be7e159 llvm::DAGTypeLegalizer::WidenVectorOperand(llvm::SDNode*, unsigned int) + 3145
    10 llc 0x000000010be1be87 llvm::SelectionDAG::LegalizeTypes() + 4039

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! There are no illegal vscale types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? It's just more bugs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can always loop. In any case, assert in the compiler is not an OK failure mode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DAG error message could be improved. Tell me there were scalable vectors. The GlobalISel assert is not helpful at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <vscale x 3 x double> case is between tricky and might be supported in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot scalarize or widen scalable vectors. The vscale is unknown at compile time.

We can widen scalable vector arithmetic just fine. It's loads/store that are broken because we can't access more memory. It would be kind of messy, but it should be possible to generate a mask to block off the extra elements so a scalable load/store could be widened. Just need a step_vector, a splat of (vscale * original_min_elts), and a compare I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_STEP_VECTOR is still missing. Can this be target independent?

@tschuett
Copy link
Author

tschuett commented Nov 4, 2024

I looked at the GlobalISel assert again:

 llc                      0x000000010dcbe76e llvm::MachineIRBuilder::buildInstr(unsigned int, llvm::ArrayRef<llvm::DstOp>, llvm::ArrayRef<llvm::SrcOp>, std::__1::optional<unsigned int>) + 7086
8  llc                      0x000000010dbae9fc llvm::CSEMIRBuilder::buildInstr(unsigned int, llvm::ArrayRef<llvm::DstOp>, llvm::ArrayRef<llvm::SrcOp>, std::__1::optional<unsigned int>) + 3324
9  llc                      0x000000010dcb5e6a llvm::MachineIRBuilder::buildMergeLikeInstr(llvm::DstOp const&, llvm::ArrayRef<llvm::Register>) + 282
10 llc                      0x000000010dbb9bef mergeVectorRegsToResultRegs(llvm::MachineIRBuilder&, llvm::ArrayRef<llvm::Register>, llvm::ArrayRef<llvm::Register>) + 383
11 llc                      0x000000010dbb5dea llvm::CallLowering::handleAssignments(llvm::CallLowering::ValueHandler&, llvm::SmallVectorImpl<llvm::CallLowering::ArgInfo>&, llvm::CCState&, llvm::SmallVectorImpl<llvm::CCValAssign>&, llvm::MachineIRBuilder&, llvm::ArrayRef<llvm::Register>) const + 9722
12 llc                      0x000000010b1a071f llvm::AArch64CallLowering::lowerFormalArguments(llvm::MachineIRBuilder&, llvm::Function const&, llvm::ArrayRef<llvm::ArrayRef<llvm::Register>>, llvm::FunctionLoweringInfo&) const + 1823
13 llc                      0x000000010dc33012 llvm::IRTranslator::runOnMachineFunction(llvm::MachineFunction&) + 4242
14 llc                      0x000000010c5a538e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 782

It comes out of lowerFormalArguments and mergeVectorRegsToResultRegs. And it was G_CONCAT_VECTORS.

@aemerson
Copy link
Contributor

aemerson commented Nov 5, 2024

@tschuett What's your overall strategy and plan for implementing support for SVE? This is a big task and if you have a year or more of time to dedicate to this, I highly recommend you first take some time to understand the existing implementation before you write a single line of code.

Sander & Paul have been working on scalable vector support for 10 years (wow, time flies!) so follow their advice. I think for scalable vectors/SVE specifically, since we have so much engineering effort invested into this we should be trying to re-use as much as possible and only diverge when absolutely necessary.

@tschuett
Copy link
Author

tschuett commented Nov 5, 2024

There are no SVE patterns for fneg today. Either there is no demand for them or C++ is more efficient.

For GlobalISel, we have

  • fneg pattern with implicit_def
  • fneg pattern with movprfx
  • C++

fneg is just an example. There are many predicated (merging and/or zeroing) instructions without good pattern coverage.

Edit: There are [s,u][max,mix] instructions but no patterns for SVE.
And there is really low hanging fruit: https://github.com/llvm/llvm-project/pull/114664/files

#114644

@aemerson
Copy link
Contributor

aemerson commented Nov 5, 2024

There are no SVE patterns for fneg today. Either there is no demand for them or C++ is more efficient.

For GlobalISel, we have

  • fneg pattern with implicit_def
  • fneg pattern with movprfx
  • C++

fneg is just an example. There are many predicated (merging and/or zeroing) instructions without good pattern coverage.

Edit: There are [s,u][max,mix] instructions but no patterns for SVE. And there is really low hanging fruit: https://github.com/llvm/llvm-project/pull/114664/files

#114644

Ok so before we go ahead and diverge implementations between the two, let's first understand why SelectionDAG chose to go a different route and what the trade-offs are. If we have half-baked implementations without a cohesive idea about what the overall design should be, then that's worse than not having anything at all. E.g. let's think about how to incorporate the data-dependency breaking part that Paul mentioned before.

@aemerson
Copy link
Contributor

aemerson commented Nov 5, 2024

Forgot to mention, are the patterns you're adding beneficial for SelectionDAG too? If not, why? Those are the kind of questions I expect will help us to understand the codebase better.

@tschuett
Copy link
Author

tschuett commented Nov 5, 2024

The SDAG is C++ first. They don't need fneg patterns. They select in C++. AFAIK, there is no mechanism to disable/enable patterns for different backends. As there is no such mechanism, they are probably more annoying than helpful.

@davemgreen
Copy link
Collaborator

In SDAG any sve nodes which require predication will be lowered to target-specific predicated ops, in this case:

  case ISD::FNEG:
    return LowerToPredicatedOp(Op, DAG, AArch64ISD::FNEG_MERGE_PASSTHRU);

(This has the obvious disadvantage that we lose any target independent optimizations that could happen after we convert them, but that maybe isn't as important for SVE).

Those nodes then have tablegen patterns through

def AArch64fneg_mt   : SDNode<"AArch64ISD::FNEG_MERGE_PASSTHRU", SDT_AArch64Arith>;

The most straight-forward route for GISel would be to mirror that, and mark AArch64fneg_mt as GINodeEquiv with the new G_ nodes that get created.

@tschuett
Copy link
Author

tschuett commented Nov 5, 2024

End we up again at:
defm FNEG_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b101, "fneg", AArch64fneg_mt>;

@arsenm
Copy link
Contributor

arsenm commented Nov 5, 2024

If you have C++ matching on the DAG path, it will hide the tablegen generated patterns in globalisel. That said, if you can write a pattern there's probably no reason to keep the manual DAG selection around.

@tschuett
Copy link
Author

tschuett commented Nov 5, 2024

SDValue AArch64TargetLowering::LowerToPredicatedOp(SDValue Op,
                                                   SelectionDAG &DAG,
                                                   unsigned NewOp) const {

...
  if (isMergePassthruOpcode(NewOp))
      Operands.push_back(DAG.getUNDEF(ContainerVT));


}

Apparently the DAG also uses Undef for merge passthrough.

@madhur13490
Copy link
Contributor

@tschuett What's your overall strategy and plan for implementing support for SVE? This is a big task and if you have a year or more of time to dedicate to this, I highly recommend you first take some time to understand the existing implementation before you write a single line of code.

Sander & Paul have been working on scalable vector support for 10 years (wow, time flies!) so follow their advice. I think for scalable vectors/SVE specifically, since we have so much engineering effort invested into this we should be trying to re-use as much as possible and only diverge when absolutely necessary.

+1.

I think we should have a RFC for this. The plan for SVE implementation should be spelled out at some point. How are we approaching this? What pieces can be reused? I understand that we have some patches in the community (including some from my team) but we should take a moment and write down at least a rough plan

@tschuett
Copy link
Author

tschuett commented Nov 6, 2024

What's your overall strategy and plan for implementing support for SVE? That is not my strategy. It is a community effort.

@tschuett
Copy link
Author

tschuett commented Nov 6, 2024

https://github.com/llvm/llvm-project/pull/114664/files is probably the best example for low hanging fruit. This PR shows there are a lot instructions in SVE that are predicated and zeroing and/or merging. We cannot use them directly. Here is an example for not predicated instructions: #110561.

For the predicated instructions are no patterns available. How are we going to handle them? Pure patterns or a mix of C++ and patterns?

@aemerson
Copy link
Contributor

aemerson commented Nov 7, 2024

I spoke to @paulwalker-arm offline and when he has some time he agreed to write some rationale for why they chose the design they did. That should help not just those implementing GISel SVE support but the reviewers too.

@tschuett
Copy link
Author

tschuett commented Nov 7, 2024

@topperc how is RISC-V handling this with GlobalISel? I bet you also have vectorized predicated instructions.

@topperc
Copy link
Collaborator

topperc commented Nov 7, 2024

@topperc how is RISC-V handling this with GlobalISel? I bet you also have vectorized predicated instructions.

We probably hit the same assertion. The RISC-V vector support in GlobalISel has largely been done by a student and is very incomplete

@tschuett
Copy link
Author

tschuett commented Nov 7, 2024

We (the IRTranslator) blindly translate fneg from LLVM-IR into G_FNEG. We state def : GINodeEquiv<G_FNEG, fneg>; . I believe there is not much space than adding fneg patterns.

@topperc
Copy link
Collaborator

topperc commented Nov 7, 2024

We (the IRTranslator) blindly translate fneg from LLVM-IR into G_FNEG. We state def : GINodeEquiv<G_FNEG, fneg>; . I believe there is not much space than adding fneg patterns.

SelectionDAG also creates ISD::FNEG in SelectionDAGBuilder, but AArch64 explicitly Custom lowers ISD::FNEG for scalable vectors using a function called LowerToPredicatedOp. Should GISel custom legalize G_FNEG?

@tschuett
Copy link
Author

tschuett commented Nov 7, 2024

This PR is somewhere between proposal and template. The market share of fneg and fabs won't be that high. The cost of lowering fneg in the DAG for AArch64 with C++ is so low that pattens would introduce overhead or more work. For GlobalISel the situation is the opposite.

@tschuett
Copy link
Author

tschuett commented Nov 7, 2024

My favourite weapon: ag fneg llvm/lib/Target/AArch64/*.td. There are scalar and Neon patterns, but no SVE patterns.

No. This PR has .legalFor(HasSVE, {nxv2s64, nxv4s32, nxv8s16}) plus patterns.

@topperc
Copy link
Collaborator

topperc commented Nov 7, 2024

My favourite weapon: ag fneg llvm/lib/Target/AArch64/*.td. There are scalar and Neon patterns, but no SVE patterns.

No. This PR has .legalFor(HasSVE, {nxv2s64, nxv4s32, nxv8s16}) plus patterns.

Do you plan to support fixed vectors using SVE like SelectionDAG does? As I understand it, the SDAG code is written the way it is to share the same isel patterns for scalable vector and fixed vectors.

@tschuett
Copy link
Author

tschuett commented Nov 7, 2024

The joke is: not this week. Getting support for scalable vectors into AArch64 will be a ton of work.

@tschuett
Copy link
Author

tschuett commented Nov 8, 2024

Can we have a dependency breaker pass behind the instruction selector that identifies fneg and fabs patterns, kills the implicit defs, and sneaks in movprfxs?

@paulwalker-arm
Copy link
Collaborator

When designing the isel for SVE we had three main objectives:

  1. Have patterns that select to a single instruction. This encourages a more complete DAG for better pre-selection optimisations.
  2. Try not to have multiple patterns that resolve to the same instruction, although PatFrags are allowed. This encourages uniformity of the DAG to maximise the applicability of target specific combines and promotes reuse within the tablegen files.
  3. Try to preserve information relating to which lanes are defined. This allows us to make good use of SVE's various immediate and reversed instruction forms as well as enabling movprfx'ing for cases where the register allocation is not ideal. We also use this so as not to introduce side effects for things like floating point operations.

Very early on it became apparent the common ISD nodes were not fit for purpose in achieving these goals. If our use case was to only code generate stock LLVM IR then we might have gotten away with it. However, once you factor in all the intrinsics required to implement the ACLE along with potentially supporting fixed length vector types from 2048-bit down to 128-bit you soon realise the need for a matching set of predicated ISD nodes. This is essentially what we have implemented and whilst we might not always achieve the above objectives, on the whole we're pretty close and I certainly don't want global-isel preventing us from closing any gaps.

With that said, I'm completely ignorant when it comes to global-isel and so only offer the following guidance:

  1. Make sure there's significant value in enabling global-isel for SVE before embarking on what's likely to be a long mission.
  2. Functional global-isel for SVE is of no value if the code quality is not good. I don't recommend a piecemeal implementation that assumes SVE is just NEON but using scalable vector types.
  3. Predication is key to efficient SVE code generation and any representation that doesn't support it is likely to be flawed.

@tschuett
Copy link
Author

Thank you for your write-up!

fneg from LLVM-IR is roughly equivalent to G_FNEG. They can be scalar or vectorized, but they are not predicated.
G_FNEG may become legal for SVE after this PR.

We state def : GINodeEquiv<G_FNEG, fneg>; We use this mechanism to scrape patterns from tablegen. It works for NEON. For SVE there are no patterns. You know better than me. FNEG_ZPmZ_D is predicated and merging. If we want to continue with the pattern approach, we need SVE patterns for fneg. As neither fneg nor G_FNEG are predicated, we have to stick ptrues on FNEG_ZPmZ_D. It leaves not much space for optimizing predicates. We might add more combines into tablegen for optimizations.

@tschuett
Copy link
Author

Ping.

@tschuett
Copy link
Author

tschuett commented Dec 2, 2024

Ping. Any progress?

@paulwalker-arm
Copy link
Collaborator

paulwalker-arm commented Dec 3, 2024

Adding patterns with the only purpose of having something for global-isel to scrape does not seem sensible to me. I think a proper design for how global-isel can represent predicated instructions is required. Being honest, the same is true for normal isel but at least for that we've been able to implement something that's albeit target specific.

@tschuett
Copy link
Author

tschuett commented Dec 4, 2024

In my build the GlobalISel instruction selector is at lib/Target/AArch64/AArch64GenGlobalISel.inc. We already sneak quite a bit of SVE patterns out of tablegen.

I think there are two orthogonal issues to discuss:

  • How to select predicated SVE/RISCV-V instructions in general?
  • How to select predicated SVE/RISCV-V instructions for GlobalISel?

The latter will a have hard tendency for tablegen patterns.

@tschuett
Copy link
Author

tschuett commented Dec 6, 2024

Adding patterns with the only purpose of having something for global-isel to scrape does not seem sensible to me.

That ship has sailed long ago:
llvm/lib/Target/AArch64/AArch64InstrGISel.td

// Pattern is used for GlobalISel
multiclass SIMDAcrossLaneLongPairIntrinsicGISel<string Opc, SDPatternOperator addlp> {
  // Patterns for addv(addlp(x)) ==> addlv
  def : Pat<(i16 (vecreduce_add (v4i16 (addlp (v8i8 V64:$Rn))))),
            (!cast<Instruction>(Opc#"v8i8v") V64:$Rn)>;
  def : Pat<(i16 (vecreduce_add (v8i16 (addlp (v16i8 V128:$Rn))))),
            (!cast<Instruction>(Opc#"v16i8v") V128:$Rn)>;
  def : Pat<(i32 (vecreduce_add (v4i32 (addlp (v8i16 V128:$Rn))))),
            (!cast<Instruction>(Opc#"v8i16v") V128:$Rn)>;

  // Patterns for addp(addlp(x))) ==> addlv
  def : Pat<(i32 (vecreduce_add (v2i32 (addlp (v4i16 V64:$Rn))))),
            (!cast<Instruction>(Opc#"v4i16v") V64:$Rn)>;
  def : Pat<(i64 (vecreduce_add (v2i64 (addlp (v4i32 V128:$Rn))))),
            (!cast<Instruction>(Opc#"v4i32v") V128:$Rn)>;
}

defm : SIMDAcrossLaneLongPairIntrinsicGISel<"UADDLV", AArch64uaddlp>;
defm : SIMDAcrossLaneLongPairIntrinsicGISel<"SADDLV", AArch64saddlp>;
// The following SetCC patterns are used for GlobalISel only

It is the opposite. We should finally start adding SVE patterns with GlobalISel as primary user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this inflate the size of AArch64GenDAGISel.inc by including patterns that SelectionDAG doesn't need?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> grep GenericInstruction llvm/include/llvm/Target/GenericOpcodes.td | wc -l
219

Yes, it will include patterns that SelectionDAG doesn't need/use.
There are 219 named opcodes. I have no numbers, but it will only be a small subset of them. There will be no G_FSINCOS patterns.

Thorsten Schütt added 3 commits December 7, 2024 08:05
@tschuett
Copy link
Author

First month ping.

include "AArch64GlobalISelPatterns.td"

// We want to first hit the instruction patterns.
foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this in the top level .td file seems like bad file structure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe. It is actually a cheap trick. It was interfering with the new fabs and fneg patterns. We have now patterns and can use the instead of and and xor. If this is the second to last issue, I am more than happy to move them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream they are in AArch64SVEInstrInfo.td. I just need to find a place for the new patterns to be before the old patterns.

foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the GlobalIsel include at the bottom of the file to minimize it's priority and don't bother others. It depends on AArch64SVEInstrInfo.td for the definition of the fabs and fneg instructions. At the same time, the and and xor patterns for fabs and fneg are defined in AArch64SVEInstrInfo.td. They have higher priority than my patterns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why these bf16 patterns are interfering with your new patterns that don't mention bf16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to the original location, but it fails with:
Assertion failed: (I.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 && "Expected G_CONSTANT"), function renderLogicalImm64, file AArch64InstructionSelector.cpp, line 7914.

@aemerson
Copy link
Contributor

+1.

I think we should have a RFC for this. The plan for SVE implementation should be spelled out at some point. How are we approaching this? What pieces can be reused? I understand that we have some patches in the community (including some from my team) but we should take a moment and write down at least a rough plan

I think a proper design for how global-isel can represent predicated instructions is required.

This PR is sort of jumping the gun at the moment, and we'll never have a clear project that other people or teams can join the effort without a better thought out plan. @madhur13490 asked for an RFC of a design, of which a PR can be used as an example proof-of-concept. I think multiple people on this PR (not just me) are recommending we take a step back and do design work first. Otherwise just merging in code is a liability.

@tschuett
Copy link
Author

I have no issues with you publishing an RFC on discourse. We are all contributors first.

@aemerson
Copy link
Contributor

I have no issues with you publishing an RFC on discourse. We are all contributors first.

I think there might be misunderstanding about how we work in LLVM? If you want to continue to work on SVE support please listen to the feedback of the reviewers here. Since I'm not the one trying to push this and I have other things to do at the moment I won't be writing that RFC.

@aemerson
Copy link
Contributor

Let me elaborate further. For example, you wrote 7 hours ago:

First month ping.

... and since this is your PR that you want review on, it's incumbent on you to do that work. It's one of the challenges of working with open source software in that much of the work is about building consensus, evaluating engineering trade-offs and exploring not only the "what" or "how" but the "why". On the flip side, it's also a valuable exercise in finding problematic approaches before we invest potentially 100s/1000s of person-hours of effort in a project.

Many things in LLVM follow the same pattern. When we introduced scalable types all those years ago, Paul, myself and the rest of the SVE team went through a bunch of discussions internally at ARM to find what the right approach could be. We did a lot of work to have a prototype implementation, but even then once we present to the community it took a long time to actually build consensus about how we should approach the problem. In the mean time the design changed and it was on us to actually do that investigation work since we were the ones trying to add support in LLVM. We didn't just pawn it off to the "community" and send in patches.

SVE in GlobalISel is a much smaller project than general SVE, but if the approach is not yet clear it still needs someone to do that work so that the reviewers can understand the why as well.

@tschuett
Copy link
Author

You don't have to tell us that you have other things to do. We all know that you never contribute. Putting unlimited burden on contributors is a very special kind of review style.

@topperc
Copy link
Collaborator

topperc commented Dec 13, 2024

You don't have to tell us that you have other things to do. We all know that you never contribute. Putting unlimited burden on contributors is a very special kind of review style.

This comment is inappropriate.

You have been asked to consider the big picture of how SVE will work in GISel. SelectionDAG supports both fixed and scalable vectors using SVE registers with the same isel patterns. This is done by converting them to a common interface in C++ code. Do you think that GISel should do something different to support fixed vectors? If you want to do something different, I think it is your responsibility to make a proposal. We shouldn't just ignore fixed vectors for expediency.

@tschuett
Copy link
Author

tschuett commented Dec 13, 2024

Unfortunately, it isn't. I can't see any effidence that @madhur13490 assigned the rfc or a plan for the documentation to any person or a community effort. However, @aemerson forces me to do the work, which isn't appropriate.

Fixed vectors. In the patterns above there is PTRUE_D 31 on the predicate register, where 31 means ALL. All vector lanes are active. For fixed vectors, the dag uses ptrue with less active lanes. Selection might work in C++ by using ptrue with less active lanes. For the moment, I have no idea how to do that with patterns.

From the documentation of ptrue:

VL1 when pattern = 00001
VL2 when pattern = 00010
VL3 when pattern = 00011
VL4 when pattern = 00100
VL5 when pattern = 00101
VL6 when pattern = 00110

It supports various active subvectors.

@topperc
Copy link
Collaborator

topperc commented Dec 13, 2024

Fixed vectors. In the patterns above there is PTRUE_D 31 on the predicate register, where 31 means ALL. All vector lanes are active. For fixed vectors, the dag uses ptrue with less active lanes. Selection might work in C++ by using ptrue with less active lanes. For the moment, I have no idea how to do that with patterns.

Why does it need to be done with patterns? Why can't we write C++ for GISel? AArch64 has a 8000 lines of C++ in AArch64InstructionSelector.cpp a lot of it to do custom selection. There's also the AArch64PostLegalizerLowering.cpp pass that puts things in a form for isel.

@aemerson
Copy link
Contributor

You don't have to tell us that you have other things to do. We all know that you never contribute. Putting unlimited burden on contributors is a very special kind of review style.

I don't need to write the laundry list of things I've done for this project, and I certainly won't tolerate continually being insulted by you. I've given you every chance to continue working together professionally and not once have you even offered an apology. This behavior is not welcome in the LLVM community.

@tschuett
Copy link
Author

Fixed vectors. In the patterns above there is PTRUE_D 31 on the predicate register, where 31 means ALL. All vector lanes are active. For fixed vectors, the dag uses ptrue with less active lanes. Selection might work in C++ by using ptrue with less active lanes. For the moment, I have no idea how to do that with patterns.

Why does it need to be done with patterns? Why can't we write C++ for GISel? AArch64 has a 8000 lines of C++ in AArch64InstructionSelector.cpp a lot of it to do custom selection. There's also the AArch64PostLegalizerLowering.cpp pass that puts things in a form for isel.

I bet the fixed length vectors are intrinsics. Either they are selected automatic already or we have to do the work in C++ in the selector.

@topperc
Copy link
Collaborator

topperc commented Dec 13, 2024

I bet the fixed length vectors are intrinsics. Either they are selected automatic already or we have to do the work in C++ in the selector.

They are not intrinsics. They are instruction like add <8 x i32> %x, %y that are wider than Neon types. They require -aarch64-sve-vector-bits-min=256 or a vscale_range attribute that indicates SVE vectors are at least 256. https://godbolt.org/z/q1G4vnj16

@tschuett
Copy link
Author

But then we could legalize G_ADD for wide fixed length vectors and add/generate patterns or do the work in C++.

@topperc
Copy link
Collaborator

topperc commented Dec 13, 2024

But then we could legalize G_ADD for wide fixed length vectors and add/generate patterns or do the work in C++.

Yes there are several options. Which option is what needs to be evaluated and agreed upon amongst the AArch64 stakeholders. Moving forward on SVE without an overall plan risks needing to throw away a bunch of work down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants