Skip to content

Conversation

@agorenstein-nvidia
Copy link
Contributor

@agorenstein-nvidia agorenstein-nvidia commented Jun 9, 2025

The change adds a new instcombine pattern, and associated test, for patterns like this:

  %3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer
  %4 = extractelement <4 x float> %3, i64 %idx

The shufflevector has a splat, or broadcast, mask, so the extractelement simply must be the first element of %1, so we transform this to

  %2 = extractelement <2 x float> %1, i64 0

@agorenstein-nvidia agorenstein-nvidia requested a review from nikic as a code owner June 9, 2025 13:56
@github-actions
Copy link

github-actions bot commented Jun 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jun 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (agorenstein-nvidia)

Changes

The change adds a new instcombine pattern, and associated test, for patterns like this:

  %3 = shufflevector &lt;2 x float&gt; %1, &lt;2 x float&gt; poison, &lt;4 x i32&gt; zeroinitializer
  %4 = extractelement &lt;4 x float&gt; %3, i64 %idx

The shufflevector has an all-zeros mask, so the extractelement simply must be the first element of %1, so we transform this to

  %2 = extractelement &lt;2 x float&gt; %1, i64 0

Full diff: https://github.com/llvm/llvm-project/pull/143380.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+6)
  • (added) llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll (+12)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index f946c3856948b..f2fb26c3ffae8 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -542,6 +542,12 @@ Instruction *InstCombinerImpl::visitExtractElementInst(ExtractElementInst &EI) {
         }
       }
     } else if (auto *SVI = dyn_cast<ShuffleVectorInst>(I)) {
+      // extractelt (shufflevector %v1, %v2, zeroinitializer) ->
+      // extractelt %v1, 0
+      if (isa<FixedVectorType>(SVI->getType()))
+        if (all_of(SVI->getShuffleMask(), [](int Elt) { return Elt == 0; }))
+          return ExtractElementInst::Create(SVI->getOperand(0), Builder.getInt64(0));
+
       // If this is extracting an element from a shufflevector, figure out where
       // it came from and extract from the appropriate input element instead.
       // Restrict the following transformation to fixed-length vector.
diff --git a/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll b/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll
new file mode 100644
index 0000000000000..ec80fbf01a3c9
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/vec_extract_through_broadcast.ll
@@ -0,0 +1,12 @@
+; RUN: opt -passes=instcombine -S < %s | FileCheck %s
+
+define float @extract_from_zero_init_shuffle(<2 x float> %1, i64 %idx) {
+; CHECK-LABEL: @extract_from_zero_init_shuffle(
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x float> [[W:%.*]], i64 0
+; CHECK-NEXT:    ret float [[TMP1]]
+;
+  %3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer
+  %4 = extractelement <4 x float> %3, i64 %idx
+  ret float %4
+}
+

@agorenstein-nvidia
Copy link
Contributor Author

(Per github-actions bot, it seems I don't have the permissions to add reviewers directly. So...) Adding comment to request also my colleague @Prince781 to review this work. Thanks.

@nikic nikic requested review from Prince781, RKSimon and davemgreen June 9, 2025 14:03
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a generally profitable vector transform to me, but adding some more people to confirm.

@agorenstein-nvidia
Copy link
Contributor Author

Thanks for the feedback on how to take this further, @nikic and @RKSimon ! I've just pushed some changes with the intention of realizing those extensions. I've added additional tests, including a couple negative cases (inspired by RKSimon's mention of undef).

I'll add that we can assume idx is valid (or at least, not-known-to-be-invalid) by earlier checks in the same instcombine routine, so this shouldn't be over-aggressive in removing poison opportunities when replacing idx with mask[0] (so-to-speak).

@github-actions
Copy link

github-actions bot commented Jun 9, 2025

✅ With the latest revision this PR passed the undef deprecator.

…sing meaningful global instead of magic constant for shufflemask poison-indicator
Copy link
Contributor

@Prince781 Prince781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

// getSplatIndex doesn't distinguish between the all-poison splat and
// a non-splat mask. However, if Index is -1, we still want to propagate
// that poison value.
int SrcIdx = -2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with this. I've also considered either a tiny helper function like bool test(int* outParam), or carrying along bool ValidSrcIdx, or rewriting getSplatIndex to be able to meaningfully return the all-poison splat indicator, and this seemed the best option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::optional<int> SrcIdx instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agorenstein-nvidia did std::optional not help simplify this logic?

Copy link
Contributor Author

@agorenstein-nvidia agorenstein-nvidia Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion and ping! I was out from Thursday until yesterday [edit: inclusive]; this remains top of my "priority queue" and I'm integrating your suggestions now. Provisionally it looks like that exactly captures the intent, and clearly I hadn't considered this approach before. I'll finish putting this change in, and your test-improvement-suggestions (and run the tests/linter!) and push ASAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done in the latest push, please take a look when you get the chance. It passes my local versions of git-clang-format and the lit tests, so hopefully that carries through to the CI here.

}

define i32 @extractelement_shuffle_invalid_index(i32 %v) {
; CHECK-LABEL: @extractelement_shuffle_invalid_index(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we optimize, but also in doing so miss the poison "opportunity" (in that the illegal -1 index-usage is removed). So I believe this is legal, but perhaps not optimal. However, I'm a bit confused: it is the case that earlier in our transformation we explicitly look for invalid indices in extractelements:

    // InstSimplify should handle cases where the index is invalid.
    // For fixed-length vector, it's invalid to extract out-of-range element.

Without tracing through, it looks like this may "just" be an order-of-transformations issue. I'm inclined to preserve the behavior. I've simply changed the test to reflect the new output; is there a preferred way of updating/changing these tests in this sort of situation?

; CHECK-NEXT: [[IN:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[V:%.*]], i64 0
; CHECK-NEXT: [[SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[IN]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: [[R:%.*]] = extractelement <vscale x 4 x i32> [[SPLAT]], i64 4
; CHECK-NEXT: ret i32 [[R]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is analogous to the other comment; it may be that the extractelement ... i32 4 is out of bounds (when vscale=1, so to speak). However, we are able to know that any valid index into %splat must be %v, so we just return %v. So, similar questions as the other comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to add a comment above the test explaining the reasoning,similar to your comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so! Thanks.

@Prince781 Prince781 self-requested a review June 10, 2025 15:58
@agorenstein-nvidia
Copy link
Contributor Author

Ping

@github-actions
Copy link

github-actions bot commented Jun 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@agorenstein-nvidia
Copy link
Contributor Author

Ping

I noticed shortly after this ping someone (?) triggered some test actions, which revealed clang-format errors. I fixed them, and haven't seen any other issues. Now I believe I've addressed all outstanding comments/issues, and while I have questions about some code, they're of the sort "can this code be written better?" rather than change-of-behavior. For these reasons I believe this change is all-set, pending approvals---is there anything more I can do? Happy to take on as much as possible, please let me know.

; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x float> [[W:%.*]], i64 0
; CHECK-NEXT: ret float [[TMP1]]
;
%3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) avoid using numbers for variable names (%vec, %shuffle and %extract would be better)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest push, please take a look. Thanks.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @nikic any more comments? whether we're happy with out of bounds extraction from a known splat (possibly with undef shuffle indices) seems to be the only outstanding issue

define i4 @src(i4 %x) {
  %v = insertelement <4 x i4> poison, i4 %x, i32 0
  %s = shufflevector <4 x i4> %v, <4 x i4> poison, 0, 0, 4294967295, 0
  %e = extractelement <4 x i4> %s, i32 4294967295
  ret i4 %e
}
=>
define i4 @tgt(i4 %x) {
  ret i4 %x
}
Transformation seems to be correct!

; CHECK-NEXT: [[IN:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[V:%.*]], i64 0
; CHECK-NEXT: [[SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[IN]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: [[R:%.*]] = extractelement <vscale x 4 x i32> [[SPLAT]], i64 4
; CHECK-NEXT: ret i32 [[R]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to add a comment above the test explaining the reasoning,similar to your comment here

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit b0473c5 into llvm:main Jul 4, 2025
7 checks passed
@github-actions
Copy link

github-actions bot commented Jul 4, 2025

@agorenstein-nvidia Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants