Skip to content

Conversation

@RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Mar 6, 2025

The plan is to remove the vXi64 cross lane shuffle constraint entirely, but this special case was easy to handle while I fight the remaining regressions.

@llvmbot
Copy link
Member

llvmbot commented Mar 6, 2025

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

The plan is to remove the vXi64 constraint entirely, but this special case was easy to handle while I fight the remaining regressions.


Full diff: https://github.com/llvm/llvm-project/pull/130115.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3-2)
  • (modified) llvm/test/CodeGen/X86/vector-partial-undef.ll (+1-2)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 1697da09e0f72..e418bfa8b20e8 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -6153,12 +6153,13 @@ static bool getFauxShuffleMask(SDValue N, const APInt &DemandedElts,
       return true;
     }
     // Handle CONCAT(SUB0, SUB1).
-    // Limit this to vXi64 vector cases to make the most of cross lane shuffles.
+    // Limit this to vXi64/splat vector cases to make the most of cross lane shuffles.
     if (Depth > 0 && InsertIdx == NumSubElts && NumElts == (2 * NumSubElts) &&
-        NumBitsPerElt == 64 && Src.getOpcode() == ISD::INSERT_SUBVECTOR &&
+        Src.getOpcode() == ISD::INSERT_SUBVECTOR &&
         Src.getOperand(0).isUndef() &&
         Src.getOperand(1).getValueType() == SubVT &&
         Src.getConstantOperandVal(2) == 0 &&
+        (NumBitsPerElt == 64 || Src.getOperand(1) == Sub) &&
         SDNode::areOnlyUsersOf({N.getNode(), Src.getNode()}, Sub.getNode())) {
       for (int i = 0; i != (int)NumSubElts; ++i)
         Mask.push_back(i);
diff --git a/llvm/test/CodeGen/X86/vector-partial-undef.ll b/llvm/test/CodeGen/X86/vector-partial-undef.ll
index 4753dba2d468f..7c12e5295257c 100644
--- a/llvm/test/CodeGen/X86/vector-partial-undef.ll
+++ b/llvm/test/CodeGen/X86/vector-partial-undef.ll
@@ -150,8 +150,7 @@ define <8 x i32> @xor_undef_elts_alt(<4 x i32> %x) {
 ; AVX-LABEL: xor_undef_elts_alt:
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm0, %ymm0
-; AVX-NEXT:    vmovaps {{.*#+}} ymm1 = [6,1,5,4,3,2,0,7]
+; AVX-NEXT:    vmovaps {{.*#+}} ymm1 = [2,1,1,0,3,2,0,3]
 ; AVX-NEXT:    vpermps %ymm0, %ymm1, %ymm0
 ; AVX-NEXT:    vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
 ; AVX-NEXT:    retq

…ctor(undef,sub,0),sub,c) 'subvector splat' patterns

The plan is to remove the vXi64 constraint entirely, but this special case was easy to handle while I fight the remaining regressions.
@RKSimon RKSimon force-pushed the x86-faux-subvector-splat branch from efafd89 to 6a56610 Compare March 6, 2025 15:13
@RKSimon RKSimon merged commit ea59d17 into llvm:main Mar 6, 2025
8 of 10 checks passed
@RKSimon RKSimon deleted the x86-faux-subvector-splat branch March 6, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants