Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions llvm/lib/Target/AArch64/AArch64.td
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,21 @@ def AArch64 : Target {
//===----------------------------------------------------------------------===//

include "AArch64PfmCounters.td"


//===----------------------------------------------------------------------===//
// GlobalISel patterns
//===----------------------------------------------------------------------===//

include "AArch64GlobalISelPatterns.td"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this inflate the size of AArch64GenDAGISel.inc by including patterns that SelectionDAG doesn't need?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> grep GenericInstruction llvm/include/llvm/Target/GenericOpcodes.td | wc -l
219

Yes, it will include patterns that SelectionDAG doesn't need/use.
There are 219 named opcodes. I have no numbers, but it will only be a small subset of them. There will be no G_FSINCOS patterns.


// We want to first hit the instruction patterns.
foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this in the top level .td file seems like bad file structure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe. It is actually a cheap trick. It was interfering with the new fabs and fneg patterns. We have now patterns and can use the instead of and and xor. If this is the second to last issue, I am more than happy to move them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream they are in AArch64SVEInstrInfo.td. I just need to find a place for the new patterns to be before the old patterns.

foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the GlobalIsel include at the bottom of the file to minimize it's priority and don't bother others. It depends on AArch64SVEInstrInfo.td for the definition of the fabs and fneg instructions. At the same time, the and and xor patterns for fabs and fneg are defined in AArch64SVEInstrInfo.td. They have higher priority than my patterns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why these bf16 patterns are interfering with your new patterns that don't mention bf16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to the original location, but it fails with:
Assertion failed: (I.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 && "Expected G_CONSTANT"), function renderLogicalImm64, file AArch64InstructionSelector.cpp, line 7914.

// No dedicated instruction, so just clear the sign bit.
def : Pat<(VT (fabs VT:$op)),
(AND_ZI $op, (i64 (logical_imm64_XFORM(i64 0x7fff7fff7fff7fff))))>;
// No dedicated instruction, so just invert the sign bit.
def : Pat<(VT (fneg VT:$op)),
(EOR_ZI $op, (i64 (logical_imm64_XFORM(i64 0x8000800080008000))))>;
}

36 changes: 36 additions & 0 deletions llvm/lib/Target/AArch64/AArch64GlobalISelPatterns.td
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
//===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that much about GlobalISel, but I thought we could reuse a lot of the existing patterns. Do we really need to duplicate all this functionality?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't' want to duplicate patterns. We re-use a lot of the existing ones. There are just some patterns missing, e.g., fneg and fabs.

Copy link
Author

@tschuett tschuett Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patterns like ,e.g, extract_subvector work great for us. We have issues with predicated instructions. I want to add patterns that unpredicate instructions.

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// Selection and combine patterns for GlobalISel.
//
//===----------------------------------------------------------------------===//


//unpredicate patterns



// fneg
def : Pat<(nxv2f64 (fneg nxv2f64:$src)),
(FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;
Copy link
Collaborator

@paulwalker-arm paulwalker-arm Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of IMPLICIT_DEF here can introduce false data-dependcies that will negatively impact performance. We go to some trouble to avoid this in isel (see https://reviews.llvm.org/D105889) whereby when the "real" input is not reused we emit a movprfx to break the dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ag movprfx llvm/lib/Target/AArch64/*.td movprfx seems to be seldomly used in patterns. Could you give a pointer?

Copy link
Author

@tschuett tschuett Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have access to the destination register neither in nor out in Pats. It might be different in C++.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe SelectionDAG uses FNEG_ZPmZ_S_UNDEF when all lanes are active?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found #116259, but that looks like SVE2.2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEF
pseudo instruction?
from lib/Target/AArch64/AArch64GenInstrInfo.inc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s

define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
  ; CHECK-LABEL: name: fabsnxv8half
  ; CHECK: bb.1.entry:
  ; CHECK-NEXT:   liveins: $z0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:zpr = COPY $z0
  ; CHECK-NEXT:   [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
  ; CHECK-NEXT:   [[FABS_ZPmZ_H:%[0-9]+]]:zpr = FABS_ZPmZ_H [[DEF]], [[PTRUE_H]], [[COPY]]
  ; CHECK-NEXT:   $z0 = COPY [[FABS_ZPmZ_H]]
  ; CHECK-NEXT:   RET_ReallyLR implicit $z0
entry:
   %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
  ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}

Almost the difference is the undef. It is not in the td pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0|(1ULL<<MCID::Pseudo), 0x0ULL }, // Inst #499 = FABS_ZPmZ_H_UNDEF pseudo instruction? from lib/Target/AArch64/AArch64GenInstrInfo.inc

Yes it is pseudo instruction expanded by the code added by https://reviews.llvm.org/D105889 that was shared earlier in this conversation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -global-isel -mtriple aarch64 -mattr=+sve -global-isel-abort=2 -stop-after=instruction-select -aarch64-enable-gisel-sve=1 %s -o - | FileCheck %s

define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
  ; CHECK-LABEL: name: fabsnxv8half
  ; CHECK: bb.1.entry:
  ; CHECK-NEXT:   liveins: $z0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:zpr = COPY $z0
  ; CHECK-NEXT:   [[PTRUE_H:%[0-9]+]]:ppr_3b = PTRUE_H 31, implicit $vg
  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:zpr = IMPLICIT_DEF
  ; CHECK-NEXT:   [[FABS_ZPmZ_H_UNDEF:%[0-9]+]]:zpr = FABS_ZPmZ_H_UNDEF [[DEF]], [[PTRUE_H]], [[COPY]]
  ; CHECK-NEXT:   $z0 = COPY [[FABS_ZPmZ_H_UNDEF]]
  ; CHECK-NEXT:   RET_ReallyLR implicit $z0
entry:
   %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
  ret <vscale x 8 x half> %c
}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK: {{.*}}

I can get the _UNDEF, but I don't get the killed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about the kill flag.


def : Pat<(nxv4f32 (fneg nxv4f32:$src)),
(FNEG_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;

def : Pat<(nxv8f16 (fneg nxv8f16:$src)),
(FNEG_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;

// fabs
def : Pat<(nxv2f64 (fabs nxv2f64:$src)),
(FABS_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>;

def : Pat<(nxv4f32 (fabs nxv4f32:$src)),
(FABS_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>;

def : Pat<(nxv8f16 (fabs nxv8f16:$src)),
(FABS_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>;
9 changes: 0 additions & 9 deletions llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -683,15 +683,6 @@ let Predicates = [HasSVEorSME] in {
defm NEG_ZPmZ : sve_int_un_pred_arit_bhsd<AArch64neg_mt>;
}

foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
// No dedicated instruction, so just clear the sign bit.
def : Pat<(VT (fabs VT:$op)),
(AND_ZI $op, (i64 (logical_imm64_XFORM(i64 0x7fff7fff7fff7fff))))>;
// No dedicated instruction, so just invert the sign bit.
def : Pat<(VT (fneg VT:$op)),
(EOR_ZI $op, (i64 (logical_imm64_XFORM(i64 0x8000800080008000))))>;
}

// zext(cmpeq(x, splat(0))) -> cnot(x)
def : Pat<(nxv16i8 (zext (nxv16i1 (AArch64setcc_z (nxv16i1 (SVEAllActive):$Pg), nxv16i8:$Op2, (SVEDup0), SETEQ)))),
(CNOT_ZPmZ_B $Op2, $Pg, $Op2)>;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
getActionDefinitionsBuilder({G_FABS, G_FNEG})
.legalFor({s32, s64, v2s32, v4s32, v2s64})
.legalFor(HasFP16, {s16, v4s16, v8s16})
.legalFor(HasSVE, {nxv2s64, nxv4s32, nxv8s16})
.scalarizeIf(scalarOrEltWiderThan(0, 64), 0)
.lowerIf(scalarOrEltWiderThan(0, 64))
.clampNumElements(0, v4s16, v8s16)
Expand Down
71 changes: 71 additions & 0 deletions llvm/test/CodeGen/AArch64/sve-float.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel=0 | FileCheck %s
; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel -aarch64-enable-gisel-sve=1 | FileCheck %s

;; fneg
define <vscale x 2 x double> @fnegnxv2double(<vscale x 2 x double> %a) {
; CHECK-LABEL: fnegnxv2double:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: fneg z0.d, p0/m, z0.d
; CHECK-NEXT: ret
entry:
%c = fneg <vscale x 2 x double> %a
ret <vscale x 2 x double> %c
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 3 x case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion failed: ((TypeSize::ScalarTy)SrcOps.size() * SrcOps[0].getLLTTy(*getMRI()).getSizeInBits() == DstOps[0].getLLTTy(*getMRI()).getSizeInBits() && "input vectors do not exactly cover the output vector register"), function buildInstr, file MachineIRBuilder.cpp, line 1419.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM ERROR: Unable to widen vector store
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: llc -mtriple aarch64 -mattr=+sve -global-isel=0

  1. Running pass 'Function Pass Manager' on module ''.
  2. Running pass 'AArch64 Instruction Selection' on function '@fnegnxv3double'
    Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it):
    0 llc 0x000000010c171388 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
    1 llc 0x000000010c16eadb llvm::sys::RunSignalHandlers() + 235
    2 llc 0x000000010c171b56 SignalHandler(int) + 278
    3 libsystem_platform.dylib 0x00007ff8161aee9d _sigtramp + 29
    4 libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370210070912
    5 libsystem_c.dylib 0x00007ff81609ab19 abort + 126
    6 llc 0x000000010c0b06aa llvm::report_fatal_error(llvm::Twine const&, bool) + 458
    7 llc 0x000000010c0b04d9 llvm::report_fatal_error(char const*, bool) + 41
    8 llc 0x000000010be82561 llvm::DAGTypeLegalizer::WidenVecOp_STORE(llvm::SDNode*) + 6465
    9 llc 0x000000010be7e159 llvm::DAGTypeLegalizer::WidenVectorOperand(llvm::SDNode*, unsigned int) + 3145
    10 llc 0x000000010be1be87 llvm::SelectionDAG::LegalizeTypes() + 4039

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! There are no illegal vscale types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? It's just more bugs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can always loop. In any case, assert in the compiler is not an OK failure mode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DAG error message could be improved. Tell me there were scalable vectors. The GlobalISel assert is not helpful at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <vscale x 3 x double> case is between tricky and might be supported in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot scalarize or widen scalable vectors. The vscale is unknown at compile time.

We can widen scalable vector arithmetic just fine. It's loads/store that are broken because we can't access more memory. It would be kind of messy, but it should be possible to generate a mask to block off the extra elements so a scalable load/store could be widened. Just need a step_vector, a splat of (vscale * original_min_elts), and a compare I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_STEP_VECTOR is still missing. Can this be target independent?

define <vscale x 4 x float> @fnegnxv4float(<vscale x 4 x float> %a) {
; CHECK-LABEL: fnegnxv4float:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: fneg z0.s, p0/m, z0.s
; CHECK-NEXT: ret
entry:
%c = fneg <vscale x 4 x float> %a
ret <vscale x 4 x float> %c
}

define <vscale x 8 x half> @fnegnxv8half(<vscale x 8 x half> %a) {
; CHECK-LABEL: fnegnxv8half:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: fneg z0.h, p0/m, z0.h
; CHECK-NEXT: ret
entry:
%c = fneg <vscale x 8 x half> %a
ret <vscale x 8 x half> %c
}

;; fabs
define <vscale x 2 x double> @fabsnxv2double(<vscale x 2 x double> %a) {
; CHECK-LABEL: fabsnxv2double:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: fabs z0.d, p0/m, z0.d
; CHECK-NEXT: ret
entry:
%c = tail call <vscale x 2 x double> @llvm.fabs.nxv2f64(<vscale x 2 x double> %a)
ret <vscale x 2 x double> %c
}

define <vscale x 4 x float> @fabsnxv4float(<vscale x 4 x float> %a) {
; CHECK-LABEL: fabsnxv4float:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: fabs z0.s, p0/m, z0.s
; CHECK-NEXT: ret
entry:
%c = tail call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> %a)
ret <vscale x 4 x float> %c
}

define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) {
; CHECK-LABEL: fabsnxv8half:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: fabs z0.h, p0/m, z0.h
; CHECK-NEXT: ret
entry:
%c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a)
ret <vscale x 8 x half> %c
}
Loading