-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[LoongArch] Custom legalize vector_shuffle to xvinsve0.{w/d} when possible
#160857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
zhaoqi5
wants to merge
220
commits into
users/zhaoqi5/tests-for-xvinsve0
from
users/zhaoqi5/legalize-xvinsve0
Closed
[LoongArch] Custom legalize vector_shuffle to xvinsve0.{w/d} when possible
#160857
zhaoqi5
wants to merge
220
commits into
users/zhaoqi5/tests-for-xvinsve0
from
users/zhaoqi5/legalize-xvinsve0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Factor out from #151275 Remove all UnsafeFPMath uses but ABI tags related part.
Windows paths have different slashes, but I don't think we care about the exact paths there anyway so I've just checked for the final filename. Fixes #160652
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.
The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:
asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");
in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
We were looking for any mention of the feature name in cpuinfo, which could have hit anything including features with common prefixes like sme, sme2, smefa64. Luckily this was not a problem but I'm changing this to find the features line and split the features into a list. Then we are only looking for exact matches. Here's the information for one core as an example: ``` processor : 7 BogoMIPS : 200.00 Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid <...> CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd0f CPU revision : 0 ``` (and to avoid any doubt, this is from a CPU simulated in Arm's FVP, it's not real) Note that the layout of the label, colon, values is sometimes aligned but not always. So I trim whitespace a few times to normalise that. This repeats once for each core so we only need to find one features line.
#160823) - Fix #156591 (comment) - As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default rounding mode is **round to nearest**.
Split out from #151300 to isolate TargetTransformInfo cost modelling for fault-only-first loads from VPlan implementation details. This change adds costing support for vp.load.ff independently of the VPlan work. For now, model a vp.load.ff as cost-equivalent to a vp.load.
This fixes the ifdefs added in e9e166e; we need to include int_lib.h first before we can expect these defines to be set. Also remove the XFAILs for aarch64 windows. As this test now became a no-op on platforms that lack CRT_HAS_128BIT or CRT_HAS_F128 (aarch64 windows lacks the latter), it no longer fails.
…remat (#159110) Currently, something like: ``` $eax = MOV32ri -11, implicit-def $rax %al = COPY $eax ``` Can be rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def $rax ``` Which marks the full $rax as used, not just $al. With this change, this is rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al ``` To indicate that only $al is used. Note: This issue is latent right now, but is exposed when #134408 is applied, as it results in the register pressure being incorrectly calculated (unless this patch is applied too). I think this change is in line with past fixes in this area, notably: 059cead 69cd121
…if successor is loop header (#154063) This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split. The critical edge's destination a loop header. Splitting the critical edge will not break structured CFG. Add a nvptx test to demonstrate the issue since the target also requires structured CFG. --------- Co-authored-by: Matt Arsenault <[email protected]>
In the im2col decomposition, propagate the filter tensor encoding (if specified) through the tensor.collapse_shape op, so that it can be used by the consuming linalg.generic matmul op. Signed-off-by: Fabrizio Indirli <[email protected]>
Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: #160572
…freeze(x),freeze(y)) (#160835)
…s(freeze(x),freeze(y)) (#160837)
…ilvar(freeze(x),freeze(y)) (#160836)
This flags enables the compiler to generate most of the debug information in a separate file which can be useful for executable size and link times. Clang already supports this flag. I have tried to follow the logic of the clang implementation where possible. Some functions were moved where they could be used by both clang and flang. The `addOtherOptions` was renamed to `addDebugOptions` to better reflect its purpose. Clang also set the `splitDebugFilename` field of the `DICompileUnit` in the IR when this option is present. That part is currently missing from this patch and will come in a follow-up PR.
…160021) This patch makes the following updates to the `QualGroup` documentation: ✅ 1. Move to Reference section Relocated the Qualification Working Group (QualGroup) docs from the main index into the Reference section for better organization and consistency. ✅ 2. Add link in GettingInvolved Inserted a proper link to the QualGroup documentation in the GettingInvolved sync-ups table, improving discoverability for newcomers. ✅ 3. Align structure with Security Group Revised the documentation layout to follow the same structure pattern as the Security Group docs, ensuring consistency across LLVM working group references.
… `_LIBCPP_VERSION` (#160627) And add some guaranteed cases (namely, for `expected`, `optional`, and `variant`) to `is_implicit_lifetime.pass.cpp`. It's somehow unfortunate that `pair` and `tuple` are not guaranteed to propagate triviality of copy/move constructors, and MSVC STL fails to do so due to ABI compatibility. This affects the implicit-lifetime property.
…oisonForTargetNode - add X86ISD::PSHUFB handling (#160842) X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))
On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.
Program itself is unused in that file, so just include the needed headers.
… clang (#160605) When cross-compiling the LLVM project as a whole (from llvm/), if it cannot find presupplied tools it will create a native build environment to build the tools it needs. However, when doing a standalone build of clang (that is, from clang/ and linking against an existing libLLVM) this doesn't work. Instead a _target_ binary is built which predictably then fails. The conventional workaround for this is to build the native tools in a separate native compile phase and pass the paths to the cross build, for example see OpenEmbedded[1] or Nix[2]. But we can do better! The first problem is that LLVM_USE_HOST_TOOLS is only set in the llvm/ CMakeLists.txt, so setup_host_tool() will never consider building a native binary. This can be solved by setting LLVM_USE_HOST_TOOLS based on CMAKE_CROSSCOMPILING in clang/CMakeLists.txt in the standalone case. Now setup_host_tool() will try to build a native tool, but it needs build_native_tool() from CrossCompile.cmake, so that also needs to be included. Finally, the native binary then fails because there's no provider for the dependency "CONFIGURE_Clang_NATIVE", so use llvm_create_cross_target to create the native environment. These few lines mirror what the lldb CMakeLists.txt does in the standalone case, so there is prior art for this. [1] https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang_git.bb?id=e18d697e92b55e57124e80234369d46575226386#n212 [2] https://github.com/NixOS/nixpkgs/blob/3354d448f2a26117a74638957b0131ce3da9c8c4/pkgs/development/compilers/llvm/common/tblgen.nix#L54
…oisonForTargetNode - add X86ISD::VPERMV handling (#160845) X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))
llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not need to be tested any more.
Mostly mechanical changes to add the missing field.
…ertps(freeze(x),freeze(y),i) (#160852)
…oisonForTargetNode - add X86ISD::VPERMILPV handling (#160849) X86ISD::VPERMILPV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermilps(x,y)) -> vpermilps(freeze(x),freeze(y))
Member
|
@llvm/pr-subscribers-backend-loongarch Author: ZhaoQi (zhaoqi5) ChangesPatch is 39.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160857.diff 4 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index 5d4a8fd080202..194f42995d55a 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -2317,6 +2317,54 @@ static SDValue lowerVECTOR_SHUFFLE_XVPICKOD(const SDLoc &DL, ArrayRef<int> Mask,
return DAG.getNode(LoongArchISD::VPICKOD, DL, VT, V2, V1);
}
+// Check if exactly one element of the Mask is replaced by 'Replaced', while
+// all other elements are either 'Base + i' or undef (-1). On success, return
+// the index of the replaced element. Otherwise, just return -1.
+static int checkReplaceOne(ArrayRef<int> Mask, int Base, int Replaced) {
+ int MaskSize = Mask.size();
+ int Idx = -1;
+ for (int i = 0; i < MaskSize; ++i) {
+ if (Mask[i] == Base + i || Mask[i] == -1)
+ continue;
+ if (Mask[i] != Replaced)
+ return -1;
+ if (Idx == -1)
+ Idx = i;
+ else
+ return -1;
+ }
+ return Idx;
+}
+
+/// Lower VECTOR_SHUFFLE into XVINSVE0 (if possible).
+static SDValue
+lowerVECTOR_SHUFFLE_XVINSVE0(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
+ SDValue V1, SDValue V2, SelectionDAG &DAG,
+ const LoongArchSubtarget &Subtarget) {
+ // LoongArch LASX only supports xvinsve0.{w/d}.
+ if (VT != MVT::v8i32 && VT != MVT::v8f32 && VT != MVT::v4i64 &&
+ VT != MVT::v4f64)
+ return SDValue();
+
+ MVT GRLenVT = Subtarget.getGRLenVT();
+ int MaskSize = Mask.size();
+ assert(MaskSize == (int)VT.getVectorNumElements() && "Unexpected mask size");
+
+ // Case 1: the lowest element of V2 replaces one element in V1.
+ int Idx = checkReplaceOne(Mask, 0, MaskSize);
+ if (Idx != -1)
+ return DAG.getNode(LoongArchISD::XVINSVE0, DL, VT, V1, V2,
+ DAG.getConstant(Idx, DL, GRLenVT));
+
+ // Case 2: the lowest element of V1 replaces one element in V2.
+ Idx = checkReplaceOne(Mask, MaskSize, 0);
+ if (Idx != -1)
+ return DAG.getNode(LoongArchISD::XVINSVE0, DL, VT, V2, V1,
+ DAG.getConstant(Idx, DL, GRLenVT));
+
+ return SDValue();
+}
+
/// Lower VECTOR_SHUFFLE into XVSHUF (if possible).
static SDValue lowerVECTOR_SHUFFLE_XVSHUF(const SDLoc &DL, ArrayRef<int> Mask,
MVT VT, SDValue V1, SDValue V2,
@@ -2593,6 +2641,9 @@ static SDValue lower256BitShuffle(const SDLoc &DL, ArrayRef<int> Mask, MVT VT,
if ((Result = lowerVECTOR_SHUFFLEAsShift(DL, Mask, VT, V1, V2, DAG, Subtarget,
Zeroable)))
return Result;
+ if ((Result =
+ lowerVECTOR_SHUFFLE_XVINSVE0(DL, Mask, VT, V1, V2, DAG, Subtarget)))
+ return Result;
if ((Result = lowerVECTOR_SHUFFLEAsByteRotate(DL, Mask, VT, V1, V2, DAG,
Subtarget)))
return Result;
@@ -7450,6 +7501,7 @@ const char *LoongArchTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(XVPERM)
NODE_NAME_CASE(XVREPLVE0)
NODE_NAME_CASE(XVREPLVE0Q)
+ NODE_NAME_CASE(XVINSVE0)
NODE_NAME_CASE(VPICK_SEXT_ELT)
NODE_NAME_CASE(VPICK_ZEXT_ELT)
NODE_NAME_CASE(VREPLVE)
diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
index b2fccf59169ff..3e7ea5ebba79e 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.h
@@ -151,6 +151,7 @@ enum NodeType : unsigned {
XVPERM,
XVREPLVE0,
XVREPLVE0Q,
+ XVINSVE0,
// Extended vector element extraction
VPICK_SEXT_ELT,
diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
index adfe990ba1234..dfcbfff2a9a72 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
@@ -20,6 +20,7 @@ def loongarch_xvpermi: SDNode<"LoongArchISD::XVPERMI", SDT_LoongArchV1RUimm>;
def loongarch_xvperm: SDNode<"LoongArchISD::XVPERM", SDT_LoongArchXVPERM>;
def loongarch_xvreplve0: SDNode<"LoongArchISD::XVREPLVE0", SDT_LoongArchXVREPLVE0>;
def loongarch_xvreplve0q: SDNode<"LoongArchISD::XVREPLVE0Q", SDT_LoongArchXVREPLVE0>;
+def loongarch_xvinsve0 : SDNode<"LoongArchISD::XVINSVE0", SDT_LoongArchV2RUimm>;
def loongarch_xvmskltz: SDNode<"LoongArchISD::XVMSKLTZ", SDT_LoongArchVMSKCOND>;
def loongarch_xvmskgez: SDNode<"LoongArchISD::XVMSKGEZ", SDT_LoongArchVMSKCOND>;
def loongarch_xvmskeqz: SDNode<"LoongArchISD::XVMSKEQZ", SDT_LoongArchVMSKCOND>;
@@ -1708,6 +1709,14 @@ def : Pat<(vector_insert v4f64:$xd, (f64(bitconvert i64:$rj)), uimm2:$imm),
(XVINSGR2VR_D v4f64:$xd, GPR:$rj, uimm2:$imm)>;
// XVINSVE0_{W/D}
+def : Pat<(loongarch_xvinsve0 v8i32:$xd, v8i32:$xj, uimm3:$imm),
+ (XVINSVE0_W v8i32:$xd, v8i32:$xj, uimm3:$imm)>;
+def : Pat<(loongarch_xvinsve0 v4i64:$xd, v4i64:$xj, uimm2:$imm),
+ (XVINSVE0_D v4i64:$xd, v4i64:$xj, uimm2:$imm)>;
+def : Pat<(loongarch_xvinsve0 v8f32:$xd, v8f32:$xj, uimm3:$imm),
+ (XVINSVE0_W v8f32:$xd, v8f32:$xj, uimm3:$imm)>;
+def : Pat<(loongarch_xvinsve0 v4f64:$xd, v4f64:$xj, uimm2:$imm),
+ (XVINSVE0_D v4f64:$xd, v4f64:$xj, uimm2:$imm)>;
def : Pat<(vector_insert v8f32:$xd, FPR32:$fj, uimm3:$imm),
(XVINSVE0_W v8f32:$xd, (SUBREG_TO_REG(i64 0), FPR32:$fj, sub_32),
uimm3:$imm)>;
diff --git a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvinsve0.ll b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvinsve0.ll
index b6c9c4da05e5a..d5a7dbf5d57af 100644
--- a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvinsve0.ll
+++ b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvinsve0.ll
@@ -1,6 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
-; RUN: llc --mtriple=loongarch32 --mattr=+32s,+lasx < %s | FileCheck %s --check-prefixes=CHECK,LA32
-; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s --check-prefixes=CHECK,LA64
+; RUN: llc --mtriple=loongarch32 --mattr=+32s,+lasx < %s | FileCheck %s
+; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s
;; xvinsve0.w
define void @xvinsve0_v8i32_l_0(ptr %d, ptr %a, ptr %b) nounwind {
@@ -8,10 +8,8 @@ define void @xvinsve0_v8i32_l_0(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI0_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI0_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 0
+; CHECK-NEXT: xvst $xr0, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -26,10 +24,8 @@ define void @xvinsve0_v8i32_l_1(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI1_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI1_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 1
+; CHECK-NEXT: xvst $xr0, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -44,10 +40,8 @@ define void @xvinsve0_v8i32_l_2(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI2_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI2_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 2
+; CHECK-NEXT: xvst $xr0, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -62,10 +56,8 @@ define void @xvinsve0_v8i32_l_3(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI3_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI3_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 3
+; CHECK-NEXT: xvst $xr0, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -76,52 +68,13 @@ entry:
}
define void @xvinsve0_v8i32_l_4(ptr %d, ptr %a, ptr %b) nounwind {
-; LA32-LABEL: xvinsve0_v8i32_l_4:
-; LA32: # %bb.0: # %entry
-; LA32-NEXT: ld.w $a2, $a2, 0
-; LA32-NEXT: xvld $xr0, $a1, 0
-; LA32-NEXT: vinsgr2vr.w $vr1, $a2, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA32-NEXT: xvpermi.q $xr2, $xr1, 2
-; LA32-NEXT: xvst $xr2, $a0, 0
-; LA32-NEXT: ret
-;
-; LA64-LABEL: xvinsve0_v8i32_l_4:
-; LA64: # %bb.0: # %entry
-; LA64-NEXT: xvld $xr0, $a2, 0
-; LA64-NEXT: xvld $xr1, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA64-NEXT: vinsgr2vr.w $vr0, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 5
-; LA64-NEXT: vinsgr2vr.w $vr0, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 6
-; LA64-NEXT: vinsgr2vr.w $vr0, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 7
-; LA64-NEXT: vinsgr2vr.w $vr0, $a1, 3
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 0
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 1
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 2
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 3
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA64-NEXT: xvpermi.q $xr2, $xr0, 2
-; LA64-NEXT: xvst $xr2, $a0, 0
-; LA64-NEXT: ret
+; CHECK-LABEL: xvinsve0_v8i32_l_4:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: xvld $xr0, $a1, 0
+; CHECK-NEXT: xvld $xr1, $a2, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 4
+; CHECK-NEXT: xvst $xr0, $a0, 0
+; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
%vb = load <8 x i32>, ptr %b
@@ -131,52 +84,13 @@ entry:
}
define void @xvinsve0_v8i32_l_5(ptr %d, ptr %a, ptr %b) nounwind {
-; LA32-LABEL: xvinsve0_v8i32_l_5:
-; LA32: # %bb.0: # %entry
-; LA32-NEXT: xvld $xr0, $a1, 0
-; LA32-NEXT: ld.w $a1, $a2, 0
-; LA32-NEXT: xvpickve2gr.w $a2, $xr0, 4
-; LA32-NEXT: vinsgr2vr.w $vr1, $a2, 0
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA32-NEXT: xvpermi.q $xr2, $xr1, 2
-; LA32-NEXT: xvst $xr2, $a0, 0
-; LA32-NEXT: ret
-;
-; LA64-LABEL: xvinsve0_v8i32_l_5:
-; LA64: # %bb.0: # %entry
-; LA64-NEXT: xvld $xr0, $a1, 0
-; LA64-NEXT: xvld $xr1, $a2, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 4
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 0
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA64-NEXT: xvpermi.q $xr1, $xr2, 2
-; LA64-NEXT: xvst $xr1, $a0, 0
-; LA64-NEXT: ret
+; CHECK-LABEL: xvinsve0_v8i32_l_5:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: xvld $xr0, $a1, 0
+; CHECK-NEXT: xvld $xr1, $a2, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 5
+; CHECK-NEXT: xvst $xr0, $a0, 0
+; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
%vb = load <8 x i32>, ptr %b
@@ -186,52 +100,13 @@ entry:
}
define void @xvinsve0_v8i32_l_6(ptr %d, ptr %a, ptr %b) nounwind {
-; LA32-LABEL: xvinsve0_v8i32_l_6:
-; LA32: # %bb.0: # %entry
-; LA32-NEXT: xvld $xr0, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 4
-; LA32-NEXT: ld.w $a2, $a2, 0
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA32-NEXT: vinsgr2vr.w $vr1, $a2, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA32-NEXT: xvpermi.q $xr2, $xr1, 2
-; LA32-NEXT: xvst $xr2, $a0, 0
-; LA32-NEXT: ret
-;
-; LA64-LABEL: xvinsve0_v8i32_l_6:
-; LA64: # %bb.0: # %entry
-; LA64-NEXT: xvld $xr0, $a1, 0
-; LA64-NEXT: xvld $xr1, $a2, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 4
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 0
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA64-NEXT: xvpermi.q $xr1, $xr2, 2
-; LA64-NEXT: xvst $xr1, $a0, 0
-; LA64-NEXT: ret
+; CHECK-LABEL: xvinsve0_v8i32_l_6:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: xvld $xr0, $a1, 0
+; CHECK-NEXT: xvld $xr1, $a2, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 6
+; CHECK-NEXT: xvst $xr0, $a0, 0
+; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
%vb = load <8 x i32>, ptr %b
@@ -241,52 +116,13 @@ entry:
}
define void @xvinsve0_v8i32_l_7(ptr %d, ptr %a, ptr %b) nounwind {
-; LA32-LABEL: xvinsve0_v8i32_l_7:
-; LA32: # %bb.0: # %entry
-; LA32-NEXT: xvld $xr0, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 4
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA32-NEXT: ld.w $a2, $a2, 0
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA32-NEXT: vinsgr2vr.w $vr1, $a2, 3
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA32-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA32-NEXT: xvpermi.q $xr2, $xr1, 2
-; LA32-NEXT: xvst $xr2, $a0, 0
-; LA32-NEXT: ret
-;
-; LA64-LABEL: xvinsve0_v8i32_l_7:
-; LA64: # %bb.0: # %entry
-; LA64-NEXT: xvld $xr0, $a1, 0
-; LA64-NEXT: xvld $xr1, $a2, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 4
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr1, 0
-; LA64-NEXT: vinsgr2vr.w $vr2, $a1, 3
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 0
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 1
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 2
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA64-NEXT: xvpickve2gr.w $a1, $xr0, 3
-; LA64-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA64-NEXT: xvpermi.q $xr1, $xr2, 2
-; LA64-NEXT: xvst $xr1, $a0, 0
-; LA64-NEXT: ret
+; CHECK-LABEL: xvinsve0_v8i32_l_7:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: xvld $xr0, $a1, 0
+; CHECK-NEXT: xvld $xr1, $a2, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 7
+; CHECK-NEXT: xvst $xr0, $a0, 0
+; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
%vb = load <8 x i32>, ptr %b
@@ -300,10 +136,8 @@ define void @xvinsve0_v8f32_l(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI8_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI8_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr0, $xr1, 0
+; CHECK-NEXT: xvst $xr0, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x float>, ptr %a
@@ -318,10 +152,8 @@ define void @xvinsve0_v8i32_h_0(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI9_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI9_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr1, $xr0, 0
+; CHECK-NEXT: xvst $xr1, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -336,10 +168,8 @@ define void @xvinsve0_v8i32_h_1(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI10_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI10_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr1, $xr0, 1
+; CHECK-NEXT: xvst $xr1, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -354,10 +184,8 @@ define void @xvinsve0_v8i32_h_2(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI11_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI11_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr1, $xr0, 2
+; CHECK-NEXT: xvst $xr1, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -372,10 +200,8 @@ define void @xvinsve0_v8i32_h_3(ptr %d, ptr %a, ptr %b) nounwind {
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xvld $xr0, $a1, 0
; CHECK-NEXT: xvld $xr1, $a2, 0
-; CHECK-NEXT: pcalau12i $a1, %pc_hi20(.LCPI12_0)
-; CHECK-NEXT: xvld $xr2, $a1, %pc_lo12(.LCPI12_0)
-; CHECK-NEXT: xvshuf.w $xr2, $xr1, $xr0
-; CHECK-NEXT: xvst $xr2, $a0, 0
+; CHECK-NEXT: xvinsve0.w $xr1, $xr0, 3
+; CHECK-NEXT: xvst $xr1, $a0, 0
; CHECK-NEXT: ret
entry:
%va = load <8 x i32>, ptr %a
@@ -386,52 +212,13 @@ entry:
}
define void @xvinsve0_v8i32_h_4(ptr %d, ptr %a, ptr %b) nounwind {
-; LA32-LABEL: xvinsve0_v8i32_h_4:
-; LA32: # %bb.0: # %entry
-; LA32-NEXT: ld.w $a1, $a1, 0
-; LA32-NEXT: xvld $xr0, $a2, 0
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 0
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 5
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 1
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 6
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 2
-; LA32-NEXT: xvpickve2gr.w $a1, $xr0, 7
-; LA32-NEXT: vinsgr2vr.w $vr1, $a1, 3
-; LA32-NEX...
[truncated]
|
Contributor
Author
|
... Sorry for my mistake. Please forgive me for disturbing all of you. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.