Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in {
let AddedComplexity = 2 in {
def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 imm0_255:$imm))),
(EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), imm0_255:$imm)>;

foreach VT = [nxv16i8] in
def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index)))),
(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>;
Comment on lines +4073 to +4075
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this result in different output as the pattern above? is this because of the change you've made in #151729?

I would actually expect vector_splice and AArch64ISD::EXT to have the same semantics (and if there isn't some subtle difference between AArch64ISD::EXT and ISD::VECTOR_SPLICE, then I think we should remove the former)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by different output? Do you mean that if you replce the splice intrisics with AArch64's EXT intrinsics, then llvm/test/CodeGen/AArch64/sve-vector-splice.ll has different CHECK lines?

For a generic splice with two inputs, I'd expect the output to be the same. The change I made in the first PR is only for "subvector-extract" splice instructions created when lowering vector_extract, where we can mark the second input as undef.

When you say removing the former, do you mean removing the pattern? Or the intrinsic altogether? I would need to refresh my brain after the week-end but I think llvm's vector_splice, AArch64 EXT and AArch64 SPLICE all have slightly different semantics (especially for negative indices).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by different output? Do you mean that if you replce the splice intrisics with AArch64's EXT intrinsics, then llvm/test/CodeGen/AArch64/sve-vector-splice.ll has different CHECK lines?

What I meant was that if the output would be the same, you probably wouldn't have added this pattern. So I'm basically asking "why wouldn't the above pattern already cover this?".

When you say removing the former, do you mean removing the pattern?

I mean removing AArch64ISD::EXT in favour of ISD::VECTOR_SPLICE. The EXT and SPLICE SVE instructions are indeed different (the former takes an immediate, the latter a predicate), but I think the AArch64ISD::EXT and ISD::VECTOR_SPLICE SelectionDAG nodes are practically the same. Before SVE we didn't have to create a new ISD node for this because ISD::VECTOR_SHUFFLE described this pattern sufficiently, but that couldn't be used for scalable vectors and so we added the generic ISD::VECTOR_SPLICE. At the time there probably wasn't an incentive to replace uses of AArch64ISD::EXT by ISD::VECTOR_SPLICE, but if code-gen is different depending on which node we try to match, then I think there's an incentive to merge the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check about removing the EXT intrinsic then. 👍 I needed the new pattern because SDAG lowers the vector_extract nodes to vector_splice, not AArch64's EXT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I send you off on a wild goose chase, it seems I made this mistake in thinking before as in there is a subtle difference between the two: #114411 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess I'll leave the intrinsic then


foreach VT = [nxv8i16, nxv8f16, nxv8bf16] in
def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_127 i32:$index)))),
(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>;

foreach VT = [nxv4i32, nxv4f16, nxv4f32, nxv4bf16] in
def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_63 i32:$index)))),
(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>;

foreach VT = [nxv2i64, nxv2f16, nxv2f32, nxv2f64, nxv2bf16] in
def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_31 i32:$index)))),
(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>;
}
} // End HasSVE2_or_SME

Expand Down
10 changes: 5 additions & 5 deletions llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ define void @test_fixed_extract(i64 %i, i64 %n) #0 {
; CHECK-SVE2p1-NEXT: mov z1.s, p0/z, #1 // =0x1
; CHECK-SVE2p1-NEXT: fmov s0, w8
; CHECK-SVE2p1-NEXT: mov v0.s[1], v1.s[1]
; CHECK-SVE2p1-NEXT: ext z1.b, z1.b, z0.b, #8
; CHECK-SVE2p1-NEXT: ext z1.b, { z1.b, z2.b }, #8
; CHECK-SVE2p1-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SVE2p1-NEXT: // kill: def $d1 killed $d1 killed $z1
; CHECK-SVE2p1-NEXT: b use
Expand All @@ -202,12 +202,12 @@ define void @test_fixed_extract(i64 %i, i64 %n) #0 {
; CHECK-SME2-NEXT: whilelo p0.s, x0, x1
; CHECK-SME2-NEXT: cset w8, mi
; CHECK-SME2-NEXT: mov z1.s, p0/z, #1 // =0x1
; CHECK-SME2-NEXT: fmov s2, w8
; CHECK-SME2-NEXT: fmov s3, w8
; CHECK-SME2-NEXT: mov z0.s, z1.s[1]
; CHECK-SME2-NEXT: zip1 z0.s, z2.s, z0.s
; CHECK-SME2-NEXT: ext z1.b, z1.b, z0.b, #8
; CHECK-SME2-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-SME2-NEXT: ext z1.b, { z1.b, z2.b }, #8
; CHECK-SME2-NEXT: // kill: def $d1 killed $d1 killed $z1
; CHECK-SME2-NEXT: zip1 z0.s, z3.s, z0.s
; CHECK-SME2-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-SME2-NEXT: b use
%r = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 %i, i64 %n)
%v0 = call <2 x i1> @llvm.vector.extract.v2i1.nxv4i1.i64(<vscale x 4 x i1> %r, i64 0)
Expand Down
91 changes: 42 additions & 49 deletions llvm/test/CodeGen/AArch64/sve-fixed-length-partial-reduce.ll
Original file line number Diff line number Diff line change
Expand Up @@ -109,14 +109,13 @@ define <16 x i16> @two_way_i8_i16_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal
; SME-LABEL: two_way_i8_i16_vl256:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: umlalb z0.h, z2.b, z1.b
; SME-NEXT: umlalt z0.h, z2.b, z1.b
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: umlalb z0.h, z3.b, z2.b
; SME-NEXT: umlalt z0.h, z3.b, z2.b
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one example where we would gain by having subreg liveness.

Currently the ret instruction has an implicit use of z0 and z1 for ABI reasons. This forces a use of all aliasing registers, including z0_z1, which will be considered live from umlalt z0.h, z3.b, z2.b. As a consequence, ext z2.b, { z0.b, z1.b }, #16 cannot be rewritten directly as ext z1.b, { z0.b, z1.b }, #16 as it would create an interference. With subreg liveness enabled, we would see there is no interference for z0_z1.hi.

; SME-NEXT: ret
%acc = load <16 x i16>, ptr %accptr
%u = load <32 x i8>, ptr %uptr
Expand Down Expand Up @@ -232,14 +231,13 @@ define <8 x i32> @two_way_i16_i32_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal
; SME-LABEL: two_way_i16_i32_vl256:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: umlalb z0.s, z2.h, z1.h
; SME-NEXT: umlalt z0.s, z2.h, z1.h
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: umlalb z0.s, z3.h, z2.h
; SME-NEXT: umlalt z0.s, z3.h, z2.h
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <8 x i32>, ptr %accptr
%u = load <16 x i16>, ptr %uptr
Expand Down Expand Up @@ -355,14 +353,13 @@ define <4 x i64> @two_way_i32_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal
; SME-LABEL: two_way_i32_i64_vl256:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: umlalb z0.d, z2.s, z1.s
; SME-NEXT: umlalt z0.d, z2.s, z1.s
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: umlalb z0.d, z3.s, z2.s
; SME-NEXT: umlalt z0.d, z3.s, z2.s
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <4 x i64>, ptr %accptr
%u = load <8 x i32>, ptr %uptr
Expand Down Expand Up @@ -644,13 +641,12 @@ define <8 x i32> @four_way_i8_i32_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal
; SME-LABEL: four_way_i8_i32_vl256:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: udot z0.s, z2.b, z1.b
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: udot z0.s, z3.b, z2.b
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <8 x i32>, ptr %accptr
%u = load <32 x i8>, ptr %uptr
Expand Down Expand Up @@ -689,13 +685,12 @@ define <8 x i32> @four_way_i8_i32_vl256_usdot(ptr %accptr, ptr %uptr, ptr %sptr)
; SME-LABEL: four_way_i8_i32_vl256_usdot:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: usdot z0.s, z1.b, z2.b
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: usdot z0.s, z2.b, z3.b
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <8 x i32>, ptr %accptr
%u = load <32 x i8>, ptr %uptr
Expand Down Expand Up @@ -822,13 +817,12 @@ define <4 x i64> @four_way_i16_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vsca
; SME-LABEL: four_way_i16_i64_vl256:
; SME: // %bb.0:
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: ldr z1, [x1]
; SME-NEXT: ldr z2, [x2]
; SME-NEXT: udot z0.d, z2.h, z1.h
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ldr z2, [x1]
; SME-NEXT: ldr z3, [x2]
; SME-NEXT: udot z0.d, z3.h, z2.h
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <4 x i64>, ptr %accptr
%u = load <16 x i16>, ptr %uptr
Expand Down Expand Up @@ -999,10 +993,9 @@ define <4 x i64> @four_way_i8_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal
; SME-NEXT: ldr z0, [x0]
; SME-NEXT: uaddwb z0.d, z0.d, z2.s
; SME-NEXT: uaddwt z0.d, z0.d, z2.s
; SME-NEXT: mov z1.d, z0.d
; SME-NEXT: ext z1.b, z1.b, z0.b, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0
; SME-NEXT: // kill: def $q1 killed $q1 killed $z1
; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16
; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1
; SME-NEXT: mov z1.d, z2.d
; SME-NEXT: ret
%acc = load <4 x i64>, ptr %accptr
%u = load <32 x i8>, ptr %uptr
Expand Down
17 changes: 8 additions & 9 deletions llvm/test/CodeGen/AArch64/sve-pr92779.ll
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,15 @@ define void @main(ptr %0) {
; CHECK-LABEL: main:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: movi v0.2d, #0000000000000000
; CHECK-NEXT: movi v1.2d, #0000000000000000
; CHECK-NEXT: ptrue p0.d, vl1
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: uzp1 v0.2s, v1.2s, v0.2s
; CHECK-NEXT: neg v0.2s, v0.2s
; CHECK-NEXT: smov x8, v0.s[0]
; CHECK-NEXT: smov x9, v0.s[1]
; CHECK-NEXT: mov z1.d, p0/m, x8
; CHECK-NEXT: mov z1.d, p0/m, x9
; CHECK-NEXT: str z1, [x0]
; CHECK-NEXT: ext z2.b, { z0.b, z1.b }, #8
; CHECK-NEXT: uzp1 v2.2s, v0.2s, v2.2s
; CHECK-NEXT: neg v2.2s, v2.2s
; CHECK-NEXT: smov x8, v2.s[0]
; CHECK-NEXT: smov x9, v2.s[1]
; CHECK-NEXT: mov z0.d, p0/m, x8
; CHECK-NEXT: mov z0.d, p0/m, x9
; CHECK-NEXT: str z0, [x0]
; CHECK-NEXT: ret
"entry":
%1 = bitcast <vscale x 2 x i64> zeroinitializer to <vscale x 4 x i32>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mattr=+sve -force-streaming-compatible < %s | FileCheck %s
; RUN: llc -mattr=+sve2 -force-streaming-compatible < %s | FileCheck %s
; RUN: llc -mattr=+sme -force-streaming < %s | FileCheck %s
; RUN: llc -force-streaming-compatible < %s | FileCheck %s --check-prefix=NONEON-NOSVE

Expand Down Expand Up @@ -228,25 +228,25 @@ define <4 x i256> @load_sext_v4i32i256(ptr %ap) {
; CHECK-LABEL: load_sext_v4i32i256:
; CHECK: // %bb.0:
; CHECK-NEXT: ldr q0, [x0]
; CHECK-NEXT: sunpklo z1.d, z0.s
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: sunpklo z2.d, z0.s
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: sunpklo z0.d, z0.s
; CHECK-NEXT: fmov x9, d1
; CHECK-NEXT: mov z1.d, z1.d[1]
; CHECK-NEXT: fmov x11, d0
; CHECK-NEXT: mov z0.d, z0.d[1]
; CHECK-NEXT: fmov x9, d2
; CHECK-NEXT: mov z2.d, z2.d[1]
; CHECK-NEXT: asr x10, x9, #63
; CHECK-NEXT: fmov x11, d2
; CHECK-NEXT: stp x9, x10, [x8]
; CHECK-NEXT: fmov x9, d1
; CHECK-NEXT: fmov x9, d0
; CHECK-NEXT: mov z0.d, z0.d[1]
; CHECK-NEXT: asr x12, x11, #63
; CHECK-NEXT: stp x10, x10, [x8, #16]
; CHECK-NEXT: stp x11, x12, [x8, #64]
; CHECK-NEXT: stp x11, x12, [x8, #32]
; CHECK-NEXT: fmov x11, d0
; CHECK-NEXT: asr x10, x9, #63
; CHECK-NEXT: stp x12, x12, [x8, #80]
; CHECK-NEXT: stp x10, x10, [x8, #48]
; CHECK-NEXT: stp x12, x12, [x8, #48]
; CHECK-NEXT: stp x10, x10, [x8, #80]
; CHECK-NEXT: asr x12, x11, #63
; CHECK-NEXT: stp x9, x10, [x8, #32]
; CHECK-NEXT: stp x9, x10, [x8, #64]
; CHECK-NEXT: stp x12, x12, [x8, #112]
; CHECK-NEXT: stp x11, x12, [x8, #96]
; CHECK-NEXT: ret
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mattr=+sve -force-streaming-compatible < %s | FileCheck %s
; RUN: llc -mattr=+sve2 -force-streaming-compatible < %s | FileCheck %s
; RUN: llc -mattr=+sme -force-streaming < %s | FileCheck %s
; RUN: llc -force-streaming-compatible < %s | FileCheck %s --check-prefix=NONEON-NOSVE

Expand Down Expand Up @@ -78,8 +78,8 @@ define <4 x i8> @extract_subvector_v8i8(<8 x i8> %op) {
define <8 x i8> @extract_subvector_v16i8(<16 x i8> %op) {
; CHECK-LABEL: extract_subvector_v16i8:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -119,7 +119,7 @@ define <2 x i16> @extract_subvector_v4i16(<4 x i16> %op) {
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
; CHECK-NEXT: uunpklo z0.s, z0.h
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand All @@ -138,8 +138,8 @@ define <2 x i16> @extract_subvector_v4i16(<4 x i16> %op) {
define <4 x i16> @extract_subvector_v8i16(<8 x i16> %op) {
; CHECK-LABEL: extract_subvector_v8i16:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -198,8 +198,8 @@ define <1 x i32> @extract_subvector_v2i32(<2 x i32> %op) {
define <2 x i32> @extract_subvector_v4i32(<4 x i32> %op) {
; CHECK-LABEL: extract_subvector_v4i32:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -237,8 +237,8 @@ define void @extract_subvector_v8i32(ptr %a, ptr %b) {
define <1 x i64> @extract_subvector_v2i64(<2 x i64> %op) {
; CHECK-LABEL: extract_subvector_v2i64:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -297,8 +297,8 @@ define <2 x half> @extract_subvector_v4f16(<4 x half> %op) {
define <4 x half> @extract_subvector_v8f16(<8 x half> %op) {
; CHECK-LABEL: extract_subvector_v8f16:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -357,8 +357,8 @@ define <1 x float> @extract_subvector_v2f32(<2 x float> %op) {
define <2 x float> @extract_subvector_v4f32(<4 x float> %op) {
; CHECK-LABEL: extract_subvector_v4f32:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down Expand Up @@ -396,8 +396,8 @@ define void @extract_subvector_v8f32(ptr %a, ptr %b) {
define <1 x double> @extract_subvector_v2f64(<2 x double> %op) {
; CHECK-LABEL: extract_subvector_v2f64:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0_z1
; CHECK-NEXT: ext z0.b, { z0.b, z1.b }, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
Expand Down
Loading