-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[InstCombine] Pull extract through broadcast #143380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Pull extract through broadcast #143380
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms Author: None (agorenstein-nvidia) ChangesThe change adds a new instcombine pattern, and associated test, for patterns like this: The shufflevector has an all-zeros mask, so the extractelement simply must be the first element of %1, so we transform this to Full diff: https://github.com/llvm/llvm-project/pull/143380.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index f946c3856948b..f2fb26c3ffae8 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -542,6 +542,12 @@ Instruction *InstCombinerImpl::visitExtractElementInst(ExtractElementInst &EI) {
}
}
} else if (auto *SVI = dyn_cast<ShuffleVectorInst>(I)) {
+ // extractelt (shufflevector %v1, %v2, zeroinitializer) ->
+ // extractelt %v1, 0
+ if (isa<FixedVectorType>(SVI->getType()))
+ if (all_of(SVI->getShuffleMask(), [](int Elt) { return Elt == 0; }))
+ return ExtractElementInst::Create(SVI->getOperand(0), Builder.getInt64(0));
+
// If this is extracting an element from a shufflevector, figure out where
// it came from and extract from the appropriate input element instead.
// Restrict the following transformation to fixed-length vector.
diff --git a/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll b/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll
new file mode 100644
index 0000000000000..ec80fbf01a3c9
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll
@@ -0,0 +1,12 @@
+; RUN: opt -passes=instcombine -S < %s | FileCheck %s
+
+define float @extract_from_zero_init_shuffle(<2 x float> %1, i64 %idx) {
+; CHECK-LABEL: @extract_from_zero_init_shuffle(
+; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x float> [[W:%.*]], i64 0
+; CHECK-NEXT: ret float [[TMP1]]
+;
+ %3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer
+ %4 = extractelement <4 x float> %3, i64 %idx
+ ret float %4
+}
+
|
|
(Per github-actions bot, it seems I don't have the permissions to add reviewers directly. So...) Adding comment to request also my colleague @Prince781 to review this work. Thanks. |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a generally profitable vector transform to me, but adding some more people to confirm.
|
Thanks for the feedback on how to take this further, @nikic and @RKSimon ! I've just pushed some changes with the intention of realizing those extensions. I've added additional tests, including a couple negative cases (inspired by RKSimon's mention of I'll add that we can assume |
|
✅ With the latest revision this PR passed the undef deprecator. |
…sing meaningful global instead of magic constant for shufflemask poison-indicator
…ack removing negative tests
Prince781
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| // getSplatIndex doesn't distinguish between the all-poison splat and | ||
| // a non-splat mask. However, if Index is -1, we still want to propagate | ||
| // that poison value. | ||
| int SrcIdx = -2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy with this. I've also considered either a tiny helper function like bool test(int* outParam), or carrying along bool ValidSrcIdx, or rewriting getSplatIndex to be able to meaningfully return the all-poison splat indicator, and this seemed the best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::optional<int> SrcIdx instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agorenstein-nvidia did std::optional not help simplify this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion and ping! I was out from Thursday until yesterday [edit: inclusive]; this remains top of my "priority queue" and I'm integrating your suggestions now. Provisionally it looks like that exactly captures the intent, and clearly I hadn't considered this approach before. I'll finish putting this change in, and your test-improvement-suggestions (and run the tests/linter!) and push ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been done in the latest push, please take a look when you get the chance. It passes my local versions of git-clang-format and the lit tests, so hopefully that carries through to the CI here.
| } | ||
|
|
||
| define i32 @extractelement_shuffle_invalid_index(i32 %v) { | ||
| ; CHECK-LABEL: @extractelement_shuffle_invalid_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we optimize, but also in doing so miss the poison "opportunity" (in that the illegal -1 index-usage is removed). So I believe this is legal, but perhaps not optimal. However, I'm a bit confused: it is the case that earlier in our transformation we explicitly look for invalid indices in extractelements:
// InstSimplify should handle cases where the index is invalid.
// For fixed-length vector, it's invalid to extract out-of-range element.
Without tracing through, it looks like this may "just" be an order-of-transformations issue. I'm inclined to preserve the behavior. I've simply changed the test to reflect the new output; is there a preferred way of updating/changing these tests in this sort of situation?
| ; CHECK-NEXT: [[IN:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[V:%.*]], i64 0 | ||
| ; CHECK-NEXT: [[SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[IN]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer | ||
| ; CHECK-NEXT: [[R:%.*]] = extractelement <vscale x 4 x i32> [[SPLAT]], i64 4 | ||
| ; CHECK-NEXT: ret i32 [[R]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is analogous to the other comment; it may be that the extractelement ... i32 4 is out of bounds (when vscale=1, so to speak). However, we are able to know that any valid index into %splat must be %v, so we just return %v. So, similar questions as the other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to add a comment above the test explaining the reasoning,similar to your comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did so! Thanks.
|
Ping |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
I noticed shortly after this ping someone (?) triggered some test actions, which revealed clang-format errors. I fixed them, and haven't seen any other issues. Now I believe I've addressed all outstanding comments/issues, and while I have questions about some code, they're of the sort "can this code be written better?" rather than change-of-behavior. For these reasons I believe this change is all-set, pending approvals---is there anything more I can do? Happy to take on as much as possible, please let me know. |
| ; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x float> [[W:%.*]], i64 0 | ||
| ; CHECK-NEXT: ret float [[TMP1]] | ||
| ; | ||
| %3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) avoid using numbers for variable names (%vec, %shuffle and %extract would be better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest push, please take a look. Thanks.
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @nikic any more comments? whether we're happy with out of bounds extraction from a known splat (possibly with undef shuffle indices) seems to be the only outstanding issue
define i4 @src(i4 %x) {
%v = insertelement <4 x i4> poison, i4 %x, i32 0
%s = shufflevector <4 x i4> %v, <4 x i4> poison, 0, 0, 4294967295, 0
%e = extractelement <4 x i4> %s, i32 4294967295
ret i4 %e
}
=>
define i4 @tgt(i4 %x) {
ret i4 %x
}
Transformation seems to be correct!
| ; CHECK-NEXT: [[IN:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[V:%.*]], i64 0 | ||
| ; CHECK-NEXT: [[SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[IN]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer | ||
| ; CHECK-NEXT: [[R:%.*]] = extractelement <vscale x 4 x i32> [[SPLAT]], i64 4 | ||
| ; CHECK-NEXT: ret i32 [[R]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to add a comment above the test explaining the reasoning,similar to your comment here
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@agorenstein-nvidia Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
The change adds a new instcombine pattern, and associated test, for patterns like this:
The shufflevector has a splat, or broadcast, mask, so the extractelement simply must be the first element of %1, so we transform this to