-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[GlobalISel][AArch64] Legalize G_FABS and G_FNEG for SVE #114784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-aarch64 Author: Thorsten Schütt (tschuett) Changesadd patterns to unprediate instructions FNEG_ZPmZ and FABS_ZPmZ are merging predicated instructions. See https://discourse.llvm.org/t/pat-s-with-destinations/82918 Full diff: https://github.com/llvm/llvm-project/pull/114784.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index 6854cccaafa1d7..9bbd2dd0721130 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -179,3 +179,10 @@ def AArch64 : Target {
//===----------------------------------------------------------------------===//
include "AArch64PfmCounters.td"
+
+
+//===----------------------------------------------------------------------===//
+// GlobalISel patterns
+//===----------------------------------------------------------------------===//
+
+include "AArch64GlobalISelPatterns.td"
diff --git a/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td b/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td
new file mode 100644
index 00000000000000..f4424ed3499ab6
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td
@@ -0,0 +1,36 @@
+//===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Selection and combine patterns for GlobalISel.
+//
+//===----------------------------------------------------------------------===//
+
+
+//unpredicate patterns
+
+
+
+// fneg
+def : Pat<(nxv2f64 (fneg nxv2f64:$src)),
+ (FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
+
+def : Pat<(nxv4f32 (fneg nxv4f32:$src)),
+ (FNEG_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;
+
+def : Pat<(nxv8f16 (fneg nxv8f16:$src)),
+ (FNEG_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;
+
+// fabs
+def : Pat<(nxv2f64 (fabs nxv2f64:$src)),
+ (FABS_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
+
+def : Pat<(nxv4f32 (fabs nxv4f32:$src)),
+ (FABS_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;
+
+def : Pat<(nxv8f16 (fabs nxv8f16:$src)),
+ (FABS_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 6fdcaec86340ce..1871c536b003be 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -664,7 +664,7 @@ let Predicates = [HasSVEorSME] in {
defm FABS_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b100, "fabs", AArch64fabs_mt>;
defm FNEG_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b101, "fneg", AArch64fneg_mt>;
- foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
+ foreach VT = [nxv2bf16, nxv4bf16] in {
// No dedicated instruction, so just clear the sign bit.
def : Pat<(VT (fabs VT:$op)),
(AND_ZI $op, (i64 (logical_imm64_XFORM(i64 0x7fff7fff7fff7fff))))>;
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index f162d1c2973cbc..a6743404520774 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -279,6 +279,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
getActionDefinitionsBuilder({G_FABS, G_FNEG})
.legalFor({s32, s64, v2s32, v4s32, v2s64})
.legalFor(HasFP16, {s16, v4s16, v8s16})
+ .legalFor(HasSVE, {nxv2s64, nxv4s32, nxv8s16})
.scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
.lowerIf(scalarOrEltWiderThan(0, 64))
.clampNumElements(0, v4s16, v8s16)
diff --git a/llvm/test/CodeGen/AArch64/sve-float.ll b/llvm/test/CodeGen/AArch64/sve-float.ll
new file mode 100644
index 00000000000000..186d4e39226dda
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-float.ll
@@ -0,0 +1,71 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc < %s -mtriple aarch64 -mattr=+sve | FileCheck %s
+; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel -aarch64-enable-gisel-sve=1 | FileCheck %s
+
+;; fneg
+define <vscale x 2 x double> @fnegnxv2double(<vscale x 2 x double> %a) {
+; CHECK-LABEL: fnegnxv2double:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: fneg z0.d, p0/m, z0.d
+; CHECK-NEXT: ret
+entry:
+ %c = fneg <vscale x 2 x double> %a
+ ret <vscale x 2 x double> %c
+}
+
+define <vscale x 4 x float> @fnegnxv4float(<vscale x 4 x float> %a) {
+; CHECK-LABEL: fnegnxv4float:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.s
+; CHECK-NEXT: fneg z0.s, p0/m, z0.s
+; CHECK-NEXT: ret
+entry:
+ %c = fneg <vscale x 4 x float> %a
+ ret <vscale x 4 x float> %c
+}
+
+define <vscale x 8 x half> @fnegnxv8half(<vscale x 8 x half> %a) {
+; CHECK-LABEL: fnegnxv8half:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.h
+; CHECK-NEXT: fneg z0.h, p0/m, z0.h
+; CHECK-NEXT: ret
+entry:
+ %c = fneg <vscale x 8 x half> %a
+ ret <vscale x 8 x half> %c
+}
+
+;; fabs
+define <vscale x 2 x double> @fabsnxv2double(<vscale x 2 x double> %a) {
+; CHECK-LABEL: fabsnxv2double:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: fabs z0.d, p0/m, z0.d
+; CHECK-NEXT: ret
+entry:
+ %c = tail call <vscale x 2 x double> @llvm.fabs.nxv2f64(<vscale x 2 x double> %a)
+ ret <vscale x 2 x double> %c
+}
+
+define <vscale x 4 x float> @fabsnxv4float(<vscale x 4 x float> %a) {
+; CHECK-LABEL: fabsnxv4float:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.s
+; CHECK-NEXT: fabs z0.s, p0/m, z0.s
+; CHECK-NEXT: ret
+entry:
+ %c = tail call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> %a)
+ ret <vscale x 4 x float> %c
+}
+
+define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
+; CHECK-LABEL: fabsnxv8half:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ptrue p0.h
+; CHECK-NEXT: fabs z0.h, p0/m, z0.h
+; CHECK-NEXT: ret
+entry:
+ %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
+ ret <vscale x 8 x half> %c
+}
|
|
|
||
| // fneg | ||
| def : Pat<(nxv2f64 (fneg nxv2f64:$src)), | ||
| (FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of IMPLICIT_DEF here can introduce false data-dependcies that will negatively impact performance. We go to some trouble to avoid this in isel (see https://reviews.llvm.org/D105889) whereby when the "real" input is not reused we emit a movprfx to break the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ag movprfx llvm/lib/Target/AArch64/*.td movprfx seems to be seldomly used in patterns. Could you give a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have access to the destination register neither in nor out in Pats. It might be different in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe SelectionDAG uses FNEG_ZPmZ_S_UNDEF when all lanes are active?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found #116259, but that looks like SVE2.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEF
pseudo instruction?
from lib/Target/AArch64/AArch64GenInstrInfo.inc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s
define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
; CHECK-LABEL: name: fabsnxv8half
; CHECK: bb.1.entry:
; CHECK-NEXT: liveins: $z0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:zpr = COPY $z0
; CHECK-NEXT: [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
; CHECK-NEXT: [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
; CHECK-NEXT: [[FABS_ZPmZ_H:%[0-9]+]]:zpr = FABS_ZPmZ_H [[DEF]], [[PTRUE_H]], [[COPY]]
; CHECK-NEXT: $z0 = COPY [[FABS_ZPmZ_H]]
; CHECK-NEXT: RET_ReallyLR implicit $z0
entry:
%c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}
Almost the difference is the undef. It is not in the td pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEFpseudo instruction? from lib/Target/AArch64/AArch64GenInstrInfo.inc
Yes it is pseudo instruction expanded by the code added by https://reviews.llvm.org/D105889 that was shared earlier in this conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s
define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
; CHECK-LABEL: name: fabsnxv8half
; CHECK: bb.1.entry:
; CHECK-NEXT: liveins: $z0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:zpr = COPY $z0
; CHECK-NEXT: [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
; CHECK-NEXT: [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
; CHECK-NEXT: [[FABS_ZPmZ_H_UNDEF:%[0-9]+]]:zpr = FABS_ZPmZ_H_UNDEF [[DEF]], [[PTRUE_H]], [[COPY]]
; CHECK-NEXT: $z0 = COPY [[FABS_ZPmZ_H_UNDEF]]
; CHECK-NEXT: RET_ReallyLR implicit $z0
entry:
%c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}
I can get the _UNDEF, but I don't get the killed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not concerned about the kill flag.
| @@ -0,0 +1,36 @@ | |||
| //===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===// | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that much about GlobalISel, but I thought we could reuse a lot of the existing patterns. Do we really need to duplicate all this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't' want to duplicate patterns. We re-use a lot of the existing ones. There are just some patterns missing, e.g., fneg and fabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patterns like ,e.g, extract_subvector work great for us. We have issues with predicated instructions. I want to add patterns that unpredicate instructions.
| %c = fneg <vscale x 2 x double> %a | ||
| ret <vscale x 2 x double> %c | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test 3 x case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion failed: ((TypeSize::ScalarTy)SrcOps.size() * SrcOps[0].getLLTTy(*getMRI()).getSizeInBits() == DstOps[0].getLLTTy(*getMRI()).getSizeInBits() && "input vectors do not exactly cover the output vector register"), function buildInstr, file MachineIRBuilder.cpp, line 1419.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM ERROR: Unable to widen vector store
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: llc -mtriple aarch64 -mattr=+sve -global-isel=0
- Running pass 'Function Pass Manager' on module ''.
- Running pass 'AArch64 Instruction Selection' on function '@fnegnxv3double'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment varLLVM_SYMBOLIZER_PATHto point to it):
0 llc 0x000000010c171388 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
1 llc 0x000000010c16eadb llvm::sys::RunSignalHandlers() + 235
2 llc 0x000000010c171b56 SignalHandler(int) + 278
3 libsystem_platform.dylib 0x00007ff8161aee9d _sigtramp + 29
4 libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370210070912
5 libsystem_c.dylib 0x00007ff81609ab19 abort + 126
6 llc 0x000000010c0b06aa llvm::report_fatal_error(llvm::Twine const&, bool) + 458
7 llc 0x000000010c0b04d9 llvm::report_fatal_error(char const*, bool) + 41
8 llc 0x000000010be82561 llvm::DAGTypeLegalizer::WidenVecOp_STORE(llvm::SDNode*) + 6465
9 llc 0x000000010be7e159 llvm::DAGTypeLegalizer::WidenVectorOperand(llvm::SDNode*, unsigned int) + 3145
10 llc 0x000000010be1be87 llvm::SelectionDAG::LegalizeTypes() + 4039
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! There are no illegal vscale types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? It's just more bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can always loop. In any case, assert in the compiler is not an OK failure mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DAG error message could be improved. Tell me there were scalable vectors. The GlobalISel assert is not helpful at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The <vscale x 3 x double> case is between tricky and might be supported in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot scalarize or widen scalable vectors. The vscale is unknown at compile time.
We can widen scalable vector arithmetic just fine. It's loads/store that are broken because we can't access more memory. It would be kind of messy, but it should be possible to generate a mask to block off the extra elements so a scalable load/store could be widened. Just need a step_vector, a splat of (vscale * original_min_elts), and a compare I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_STEP_VECTOR is still missing. Can this be target independent?
|
I looked at the GlobalISel assert again: It comes out of lowerFormalArguments and mergeVectorRegsToResultRegs. And it was G_CONCAT_VECTORS. |
|
@tschuett What's your overall strategy and plan for implementing support for SVE? This is a big task and if you have a year or more of time to dedicate to this, I highly recommend you first take some time to understand the existing implementation before you write a single line of code. Sander & Paul have been working on scalable vector support for 10 years (wow, time flies!) so follow their advice. I think for scalable vectors/SVE specifically, since we have so much engineering effort invested into this we should be trying to re-use as much as possible and only diverge when absolutely necessary. |
|
There are no SVE patterns for fneg today. Either there is no demand for them or C++ is more efficient. For GlobalISel, we have
fneg is just an example. There are many predicated (merging and/or zeroing) instructions without good pattern coverage. Edit: There are [s,u][max,mix] instructions but no patterns for SVE. |
Ok so before we go ahead and diverge implementations between the two, let's first understand why SelectionDAG chose to go a different route and what the trade-offs are. If we have half-baked implementations without a cohesive idea about what the overall design should be, then that's worse than not having anything at all. E.g. let's think about how to incorporate the data-dependency breaking part that Paul mentioned before. |
|
Forgot to mention, are the patterns you're adding beneficial for SelectionDAG too? If not, why? Those are the kind of questions I expect will help us to understand the codebase better. |
|
The SDAG is C++ first. They don't need fneg patterns. They select in C++. AFAIK, there is no mechanism to disable/enable patterns for different backends. As there is no such mechanism, they are probably more annoying than helpful. |
|
In SDAG any sve nodes which require predication will be lowered to target-specific predicated ops, in this case: (This has the obvious disadvantage that we lose any target independent optimizations that could happen after we convert them, but that maybe isn't as important for SVE). Those nodes then have tablegen patterns through The most straight-forward route for GISel would be to mirror that, and mark AArch64fneg_mt as GINodeEquiv with the new G_ nodes that get created. |
|
End we up again at: |
|
If you have C++ matching on the DAG path, it will hide the tablegen generated patterns in globalisel. That said, if you can write a pattern there's probably no reason to keep the manual DAG selection around. |
Apparently the DAG also uses Undef for merge passthrough. |
+1. I think we should have a RFC for this. The plan for SVE implementation should be spelled out at some point. How are we approaching this? What pieces can be reused? I understand that we have some patches in the community (including some from my team) but we should take a moment and write down at least a rough plan |
|
|
|
https://github.com/llvm/llvm-project/pull/114664/files is probably the best example for low hanging fruit. This PR shows there are a lot instructions in SVE that are predicated and zeroing and/or merging. We cannot use them directly. Here is an example for not predicated instructions: #110561. For the predicated instructions are no patterns available. How are we going to handle them? Pure patterns or a mix of C++ and patterns? |
|
I spoke to @paulwalker-arm offline and when he has some time he agreed to write some rationale for why they chose the design they did. That should help not just those implementing GISel SVE support but the reviewers too. |
|
@topperc how is RISC-V handling this with GlobalISel? I bet you also have vectorized predicated instructions. |
We probably hit the same assertion. The RISC-V vector support in GlobalISel has largely been done by a student and is very incomplete |
|
We (the IRTranslator) blindly translate |
SelectionDAG also creates ISD::FNEG in SelectionDAGBuilder, but AArch64 explicitly Custom lowers ISD::FNEG for scalable vectors using a function called |
|
This PR is somewhere between proposal and template. The market share of fneg and fabs won't be that high. The cost of lowering fneg in the DAG for AArch64 with C++ is so low that pattens would introduce overhead or more work. For GlobalISel the situation is the opposite. |
|
My favourite weapon: No. This PR has |
Do you plan to support fixed vectors using SVE like SelectionDAG does? As I understand it, the SDAG code is written the way it is to share the same isel patterns for scalable vector and fixed vectors. |
|
The joke is: not this week. Getting support for scalable vectors into AArch64 will be a ton of work. |
|
Can we have a dependency breaker pass behind the instruction selector that identifies fneg and fabs patterns, kills the implicit defs, and sneaks in movprfxs? |
|
When designing the isel for SVE we had three main objectives:
Very early on it became apparent the common ISD nodes were not fit for purpose in achieving these goals. If our use case was to only code generate stock LLVM IR then we might have gotten away with it. However, once you factor in all the intrinsics required to implement the ACLE along with potentially supporting fixed length vector types from 2048-bit down to 128-bit you soon realise the need for a matching set of predicated ISD nodes. This is essentially what we have implemented and whilst we might not always achieve the above objectives, on the whole we're pretty close and I certainly don't want global-isel preventing us from closing any gaps. With that said, I'm completely ignorant when it comes to global-isel and so only offer the following guidance:
|
|
Thank you for your write-up! fneg from LLVM-IR is roughly equivalent to G_FNEG. They can be scalar or vectorized, but they are not predicated. We state def : GINodeEquiv<G_FNEG, fneg>; We use this mechanism to scrape patterns from tablegen. It works for NEON. For SVE there are no patterns. You know better than me. FNEG_ZPmZ_D is predicated and merging. If we want to continue with the pattern approach, we need SVE patterns for fneg. As neither fneg nor G_FNEG are predicated, we have to stick ptrues on FNEG_ZPmZ_D. It leaves not much space for optimizing predicates. We might add more combines into tablegen for optimizations. |
|
Ping. |
|
Ping. Any progress? |
|
Adding patterns with the only purpose of having something for global-isel to scrape does not seem sensible to me. I think a proper design for how global-isel can represent predicated instructions is required. Being honest, the same is true for normal isel but at least for that we've been able to implement something that's albeit target specific. |
|
In my build the GlobalISel instruction selector is at lib/Target/AArch64/AArch64GenGlobalISel.inc. We already sneak quite a bit of SVE patterns out of tablegen. I think there are two orthogonal issues to discuss:
The latter will a have hard tendency for tablegen patterns. |
That ship has sailed long ago: It is the opposite. We should finally start adding SVE patterns with GlobalISel as primary user. |
llvm/lib/Target/AArch64/AArch64.td
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this inflate the size of AArch64GenDAGISel.inc by including patterns that SelectionDAG doesn't need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> grep GenericInstruction llvm/include/llvm/Target/GenericOpcodes.td | wc -l
219
Yes, it will include patterns that SelectionDAG doesn't need/use.
There are 219 named opcodes. I have no numbers, but it will only be a small subset of them. There will be no G_FSINCOS patterns.
add patterns to unprediate instructions FNEG_ZPmZ and FABS_ZPmZ are merging predicated instructions. See https://discourse.llvm.org/t/pat-s-with-destinations/82918
|
First month ping. |
| include "AArch64GlobalISelPatterns.td" | ||
|
|
||
| // We want to first hit the instruction patterns. | ||
| foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this in the top level .td file seems like bad file structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe. It is actually a cheap trick. It was interfering with the new fabs and fneg patterns. We have now patterns and can use the instead of and and xor. If this is the second to last issue, I am more than happy to move them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream they are in AArch64SVEInstrInfo.td. I just need to find a place for the new patterns to be before the old patterns.
| foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the GlobalIsel include at the bottom of the file to minimize it's priority and don't bother others. It depends on AArch64SVEInstrInfo.td for the definition of the fabs and fneg instructions. At the same time, the and and xor patterns for fabs and fneg are defined in AArch64SVEInstrInfo.td. They have higher priority than my patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why these bf16 patterns are interfering with your new patterns that don't mention bf16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved them to the original location, but it fails with:
Assertion failed: (I.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 && "Expected G_CONSTANT"), function renderLogicalImm64, file AArch64InstructionSelector.cpp, line 7914.
This PR is sort of jumping the gun at the moment, and we'll never have a clear project that other people or teams can join the effort without a better thought out plan. @madhur13490 asked for an RFC of a design, of which a PR can be used as an example proof-of-concept. I think multiple people on this PR (not just me) are recommending we take a step back and do design work first. Otherwise just merging in code is a liability. |
|
I have no issues with you publishing an RFC on discourse. We are all contributors first. |
I think there might be misunderstanding about how we work in LLVM? If you want to continue to work on SVE support please listen to the feedback of the reviewers here. Since I'm not the one trying to push this and I have other things to do at the moment I won't be writing that RFC. |
|
Let me elaborate further. For example, you wrote 7 hours ago:
... and since this is your PR that you want review on, it's incumbent on you to do that work. It's one of the challenges of working with open source software in that much of the work is about building consensus, evaluating engineering trade-offs and exploring not only the "what" or "how" but the "why". On the flip side, it's also a valuable exercise in finding problematic approaches before we invest potentially 100s/1000s of person-hours of effort in a project. Many things in LLVM follow the same pattern. When we introduced scalable types all those years ago, Paul, myself and the rest of the SVE team went through a bunch of discussions internally at ARM to find what the right approach could be. We did a lot of work to have a prototype implementation, but even then once we present to the community it took a long time to actually build consensus about how we should approach the problem. In the mean time the design changed and it was on us to actually do that investigation work since we were the ones trying to add support in LLVM. We didn't just pawn it off to the "community" and send in patches. SVE in GlobalISel is a much smaller project than general SVE, but if the approach is not yet clear it still needs someone to do that work so that the reviewers can understand the why as well. |
|
You don't have to tell us that you have other things to do. We all know that you never contribute. Putting unlimited burden on contributors is a very special kind of review style. |
This comment is inappropriate. You have been asked to consider the big picture of how SVE will work in GISel. SelectionDAG supports both fixed and scalable vectors using SVE registers with the same isel patterns. This is done by converting them to a common interface in C++ code. Do you think that GISel should do something different to support fixed vectors? If you want to do something different, I think it is your responsibility to make a proposal. We shouldn't just ignore fixed vectors for expediency. |
|
Unfortunately, it isn't. I can't see any effidence that @madhur13490 assigned the rfc or a plan for the documentation to any person or a community effort. However, @aemerson forces me to do the work, which isn't appropriate. Fixed vectors. In the patterns above there is From the documentation of ptrue: It supports various active subvectors. |
Why does it need to be done with patterns? Why can't we write C++ for GISel? AArch64 has a 8000 lines of C++ in AArch64InstructionSelector.cpp a lot of it to do custom selection. There's also the AArch64PostLegalizerLowering.cpp pass that puts things in a form for isel. |
I don't need to write the laundry list of things I've done for this project, and I certainly won't tolerate continually being insulted by you. I've given you every chance to continue working together professionally and not once have you even offered an apology. This behavior is not welcome in the LLVM community. |
I bet the fixed length vectors are intrinsics. Either they are selected automatic already or we have to do the work in C++ in the selector. |
They are not intrinsics. They are instruction like |
|
But then we could legalize G_ADD for wide fixed length vectors and add/generate patterns or do the work in C++. |
Yes there are several options. Which option is what needs to be evaluated and agreed upon amongst the AArch64 stakeholders. Moving forward on SVE without an overall plan risks needing to throw away a bunch of work down the road. |
add patterns to unpredicate instructions
FNEG_ZPmZ and FABS_ZPmZ are merging predicated instructions.
See https://discourse.llvm.org/t/pat-s-with-destinations/82918