-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AArch64][ISel] Select constructive SVE2 ext instruction #151730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -109,14 +109,13 @@ define <16 x i16> @two_way_i8_i16_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal | |
| ; SME-LABEL: two_way_i8_i16_vl256: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: umlalb z0.h, z2.b, z1.b | ||
| ; SME-NEXT: umlalt z0.h, z2.b, z1.b | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: umlalb z0.h, z3.b, z2.b | ||
| ; SME-NEXT: umlalt z0.h, z3.b, z2.b | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is one example where we would gain by having subreg liveness. Currently the |
||
| ; SME-NEXT: ret | ||
| %acc = load <16 x i16>, ptr %accptr | ||
| %u = load <32 x i8>, ptr %uptr | ||
|
|
@@ -232,14 +231,13 @@ define <8 x i32> @two_way_i16_i32_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal | |
| ; SME-LABEL: two_way_i16_i32_vl256: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: umlalb z0.s, z2.h, z1.h | ||
| ; SME-NEXT: umlalt z0.s, z2.h, z1.h | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: umlalb z0.s, z3.h, z2.h | ||
| ; SME-NEXT: umlalt z0.s, z3.h, z2.h | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <8 x i32>, ptr %accptr | ||
| %u = load <16 x i16>, ptr %uptr | ||
|
|
@@ -355,14 +353,13 @@ define <4 x i64> @two_way_i32_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal | |
| ; SME-LABEL: two_way_i32_i64_vl256: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: umlalb z0.d, z2.s, z1.s | ||
| ; SME-NEXT: umlalt z0.d, z2.s, z1.s | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: umlalb z0.d, z3.s, z2.s | ||
| ; SME-NEXT: umlalt z0.d, z3.s, z2.s | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <4 x i64>, ptr %accptr | ||
| %u = load <8 x i32>, ptr %uptr | ||
|
|
@@ -644,13 +641,12 @@ define <8 x i32> @four_way_i8_i32_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal | |
| ; SME-LABEL: four_way_i8_i32_vl256: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: udot z0.s, z2.b, z1.b | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: udot z0.s, z3.b, z2.b | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <8 x i32>, ptr %accptr | ||
| %u = load <32 x i8>, ptr %uptr | ||
|
|
@@ -689,13 +685,12 @@ define <8 x i32> @four_way_i8_i32_vl256_usdot(ptr %accptr, ptr %uptr, ptr %sptr) | |
| ; SME-LABEL: four_way_i8_i32_vl256_usdot: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: usdot z0.s, z1.b, z2.b | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: usdot z0.s, z2.b, z3.b | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <8 x i32>, ptr %accptr | ||
| %u = load <32 x i8>, ptr %uptr | ||
|
|
@@ -822,13 +817,12 @@ define <4 x i64> @four_way_i16_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vsca | |
| ; SME-LABEL: four_way_i16_i64_vl256: | ||
| ; SME: // %bb.0: | ||
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: ldr z1, [x1] | ||
| ; SME-NEXT: ldr z2, [x2] | ||
| ; SME-NEXT: udot z0.d, z2.h, z1.h | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ldr z2, [x1] | ||
| ; SME-NEXT: ldr z3, [x2] | ||
| ; SME-NEXT: udot z0.d, z3.h, z2.h | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <4 x i64>, ptr %accptr | ||
| %u = load <16 x i16>, ptr %uptr | ||
|
|
@@ -999,10 +993,9 @@ define <4 x i64> @four_way_i8_i64_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal | |
| ; SME-NEXT: ldr z0, [x0] | ||
| ; SME-NEXT: uaddwb z0.d, z0.d, z2.s | ||
| ; SME-NEXT: uaddwt z0.d, z0.d, z2.s | ||
| ; SME-NEXT: mov z1.d, z0.d | ||
| ; SME-NEXT: ext z1.b, z1.b, z0.b, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0 | ||
| ; SME-NEXT: // kill: def $q1 killed $q1 killed $z1 | ||
| ; SME-NEXT: ext z2.b, { z0.b, z1.b }, #16 | ||
| ; SME-NEXT: // kill: def $q0 killed $q0 killed $z0_z1 | ||
| ; SME-NEXT: mov z1.d, z2.d | ||
| ; SME-NEXT: ret | ||
| %acc = load <4 x i64>, ptr %accptr | ||
| %u = load <32 x i8>, ptr %uptr | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this result in different output as the pattern above? is this because of the change you've made in #151729?
I would actually expect
vector_spliceandAArch64ISD::EXTto have the same semantics (and if there isn't some subtle difference betweenAArch64ISD::EXTandISD::VECTOR_SPLICE, then I think we should remove the former)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by different output? Do you mean that if you replce the splice intrisics with AArch64's EXT intrinsics, then
llvm/test/CodeGen/AArch64/sve-vector-splice.llhas different CHECK lines?For a generic splice with two inputs, I'd expect the output to be the same. The change I made in the first PR is only for "subvector-extract" splice instructions created when lowering vector_extract, where we can mark the second input as
undef.When you say removing the former, do you mean removing the pattern? Or the intrinsic altogether? I would need to refresh my brain after the week-end but I think llvm's vector_splice, AArch64 EXT and AArch64 SPLICE all have slightly different semantics (especially for negative indices).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that if the output would be the same, you probably wouldn't have added this pattern. So I'm basically asking "why wouldn't the above pattern already cover this?".
I mean removing
AArch64ISD::EXTin favour ofISD::VECTOR_SPLICE. The EXT and SPLICE SVE instructions are indeed different (the former takes an immediate, the latter a predicate), but I think theAArch64ISD::EXTandISD::VECTOR_SPLICESelectionDAG nodes are practically the same. Before SVE we didn't have to create a new ISD node for this becauseISD::VECTOR_SHUFFLEdescribed this pattern sufficiently, but that couldn't be used for scalable vectors and so we added the genericISD::VECTOR_SPLICE. At the time there probably wasn't an incentive to replace uses ofAArch64ISD::EXTbyISD::VECTOR_SPLICE, but if code-gen is different depending on which node we try to match, then I think there's an incentive to merge the two.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check about removing the EXT intrinsic then. 👍 I needed the new pattern because SDAG lowers the
vector_extractnodes tovector_splice, not AArch64'sEXT.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I send you off on a wild goose chase, it seems I made this mistake in thinking before as in there is a subtle difference between the two: #114411 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess I'll leave the intrinsic then