-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[GlobalISel][AArch64] Legalize G_FABS and G_FNEG for SVE #114784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -179,3 +179,21 @@ def AArch64 : Target { | |||
| //===----------------------------------------------------------------------===// | ||||
|
|
||||
| include "AArch64PfmCounters.td" | ||||
|
|
||||
|
|
||||
| //===----------------------------------------------------------------------===// | ||||
| // GlobalISel patterns | ||||
| //===----------------------------------------------------------------------===// | ||||
|
|
||||
| include "AArch64GlobalISelPatterns.td" | ||||
|
|
||||
| // We want to first hit the instruction patterns. | ||||
| foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in { | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having this in the top level .td file seems like bad file structure.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah maybe. It is actually a cheap trick. It was interfering with the new fabs and fneg patterns. We have now patterns and can use the instead of and and xor. If this is the second to last issue, I am more than happy to move them.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upstream they are in AArch64SVEInstrInfo.td. I just need to find a place for the new patterns to be before the old patterns.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I put the GlobalIsel include at the bottom of the file to minimize it's priority and don't bother others. It depends on AArch64SVEInstrInfo.td for the definition of the fabs and fneg instructions. At the same time, the and and xor patterns for fabs and fneg are defined in AArch64SVEInstrInfo.td. They have higher priority than my patterns.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why these bf16 patterns are interfering with your new patterns that don't mention bf16.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I moved them to the original location, but it fails with: |
||||
| // No dedicated instruction, so just clear the sign bit. | ||||
| def : Pat<(VT (fabs VT:$op)), | ||||
| (AND_ZI $op, (i64 (logical_imm64_XFORM(i64 0x7fff7fff7fff7fff))))>; | ||||
| // No dedicated instruction, so just invert the sign bit. | ||||
| def : Pat<(VT (fneg VT:$op)), | ||||
| (EOR_ZI $op, (i64 (logical_imm64_XFORM(i64 0x8000800080008000))))>; | ||||
| } | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| //===-- AArch64GlobalISelPatterns.td - GlobalISel patterns -*- tablegen -*-===// | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know that much about GlobalISel, but I thought we could reuse a lot of the existing patterns. Do we really need to duplicate all this functionality?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't' want to duplicate patterns. We re-use a lot of the existing ones. There are just some patterns missing, e.g., fneg and fabs.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Patterns like ,e.g, extract_subvector work great for us. We have issues with predicated instructions. I want to add patterns that unpredicate instructions. |
||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // Selection and combine patterns for GlobalISel. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
|
|
||
| //unpredicate patterns | ||
|
|
||
|
|
||
|
|
||
| // fneg | ||
| def : Pat<(nxv2f64 (fneg nxv2f64:$src)), | ||
| (FNEG_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>; | ||
|
||
|
|
||
| def : Pat<(nxv4f32 (fneg nxv4f32:$src)), | ||
| (FNEG_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>; | ||
|
|
||
| def : Pat<(nxv8f16 (fneg nxv8f16:$src)), | ||
| (FNEG_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>; | ||
|
|
||
| // fabs | ||
| def : Pat<(nxv2f64 (fabs nxv2f64:$src)), | ||
| (FABS_ZPmZ_D (IMPLICIT_DEF), (PTRUE_D 31), ZPR:$src)>; | ||
|
|
||
| def : Pat<(nxv4f32 (fabs nxv4f32:$src)), | ||
| (FABS_ZPmZ_S (IMPLICIT_DEF), (PTRUE_S 31), ZPR:$src)>; | ||
|
|
||
| def : Pat<(nxv8f16 (fabs nxv8f16:$src)), | ||
| (FABS_ZPmZ_H (IMPLICIT_DEF), (PTRUE_H 31), ZPR:$src)>; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel=0 | FileCheck %s | ||
| ; RUN: llc < %s -mtriple aarch64 -mattr=+sve -global-isel -aarch64-enable-gisel-sve=1 | FileCheck %s | ||
|
|
||
| ;; fneg | ||
| define <vscale x 2 x double> @fnegnxv2double(<vscale x 2 x double> %a) { | ||
| ; CHECK-LABEL: fnegnxv2double: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.d | ||
| ; CHECK-NEXT: fneg z0.d, p0/m, z0.d | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = fneg <vscale x 2 x double> %a | ||
| ret <vscale x 2 x double> %c | ||
| } | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test 3 x case?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assertion failed: ((TypeSize::ScalarTy)SrcOps.size() * SrcOps[0].getLLTTy(*getMRI()).getSizeInBits() == DstOps[0].getLLTTy(*getMRI()).getSizeInBits() && "input vectors do not exactly cover the output vector register"), function buildInstr, file MachineIRBuilder.cpp, line 1419.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LLVM ERROR: Unable to widen vector store
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No! There are no illegal vscale types.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? It's just more bugs
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can always loop. In any case, assert in the compiler is not an OK failure mode
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The DAG error message could be improved. Tell me there were scalable vectors. The GlobalISel assert is not helpful at all.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The <vscale x 3 x double> case is between tricky and might be supported in the future.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We can widen scalable vector arithmetic just fine. It's loads/store that are broken because we can't access more memory. It would be kind of messy, but it should be possible to generate a mask to block off the extra elements so a scalable load/store could be widened. Just need a step_vector, a splat of (vscale * original_min_elts), and a compare I think.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. G_STEP_VECTOR is still missing. Can this be target independent? |
||
| define <vscale x 4 x float> @fnegnxv4float(<vscale x 4 x float> %a) { | ||
| ; CHECK-LABEL: fnegnxv4float: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.s | ||
| ; CHECK-NEXT: fneg z0.s, p0/m, z0.s | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = fneg <vscale x 4 x float> %a | ||
| ret <vscale x 4 x float> %c | ||
| } | ||
|
|
||
| define <vscale x 8 x half> @fnegnxv8half(<vscale x 8 x half> %a) { | ||
| ; CHECK-LABEL: fnegnxv8half: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.h | ||
| ; CHECK-NEXT: fneg z0.h, p0/m, z0.h | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = fneg <vscale x 8 x half> %a | ||
| ret <vscale x 8 x half> %c | ||
| } | ||
|
|
||
| ;; fabs | ||
| define <vscale x 2 x double> @fabsnxv2double(<vscale x 2 x double> %a) { | ||
| ; CHECK-LABEL: fabsnxv2double: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.d | ||
| ; CHECK-NEXT: fabs z0.d, p0/m, z0.d | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = tail call <vscale x 2 x double> @llvm.fabs.nxv2f64(<vscale x 2 x double> %a) | ||
| ret <vscale x 2 x double> %c | ||
| } | ||
|
|
||
| define <vscale x 4 x float> @fabsnxv4float(<vscale x 4 x float> %a) { | ||
| ; CHECK-LABEL: fabsnxv4float: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.s | ||
| ; CHECK-NEXT: fabs z0.s, p0/m, z0.s | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = tail call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> %a) | ||
| ret <vscale x 4 x float> %c | ||
| } | ||
|
|
||
| define <vscale x 8 x half> @fabsnxv8half(<vscale x 8 x half> %a) { | ||
| ; CHECK-LABEL: fabsnxv8half: | ||
| ; CHECK: // %bb.0: // %entry | ||
| ; CHECK-NEXT: ptrue p0.h | ||
| ; CHECK-NEXT: fabs z0.h, p0/m, z0.h | ||
| ; CHECK-NEXT: ret | ||
| entry: | ||
| %c = tail call <vscale x 8 x half> @llvm.fabs.nxv8f16(<vscale x 8 x half> %a) | ||
| ret <vscale x 8 x half> %c | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this inflate the size of AArch64GenDAGISel.inc by including patterns that SelectionDAG doesn't need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will include patterns that SelectionDAG doesn't need/use.
There are 219 named opcodes. I have no numbers, but it will only be a small subset of them. There will be no G_FSINCOS patterns.