Skip to content

Commit 7d3af6b

Browse files
committed
[X86] SimplifyDemandedBitsForTargetNode - generalize X86ISD::VSRAI handling when only demanding 'known signbits'
If we only want bits that already match the signbit then we don't need to shift. Generalizes an existing pattern that just handled signbit-only demanded bits to match what we do for ISD::SRA.
1 parent 2e5a5fd commit 7d3af6b

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44615,8 +44615,11 @@ bool X86TargetLowering::SimplifyDemandedBitsForTargetNode(
4461544615

4461644616
APInt DemandedMask = OriginalDemandedBits << ShAmt;
4461744617

44618-
// If we just want the sign bit then we don't need to shift it.
44619-
if (OriginalDemandedBits.isSignMask())
44618+
// If we only want bits that already match the signbit then we don't need
44619+
// to shift.
44620+
unsigned NumHiDemandedBits = BitWidth - OriginalDemandedBits.countr_zero();
44621+
if (TLO.DAG.ComputeNumSignBits(Op0, OriginalDemandedElts, Depth + 1) >=
44622+
NumHiDemandedBits)
4462044623
return TLO.CombineTo(Op, Op0);
4462144624

4462244625
// fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1

llvm/test/CodeGen/X86/combine-pack.ll

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,19 @@
55

66
declare <8 x i16> @llvm.x86.sse2.packssdw.128(<4 x i32>, <4 x i32>)
77

8-
; TODO: Failure to remove unnecessary signsplat
98
define <8 x i16> @combine_packss_v4i32_signsplat(<4 x i32> %a0, <4 x i32> %a1) {
109
; SSE-LABEL: combine_packss_v4i32_signsplat:
1110
; SSE: # %bb.0:
1211
; SSE-NEXT: pcmpgtd %xmm1, %xmm0
1312
; SSE-NEXT: pcmpeqd %xmm1, %xmm1
1413
; SSE-NEXT: packssdw %xmm1, %xmm0
15-
; SSE-NEXT: psraw $15, %xmm0
1614
; SSE-NEXT: retq
1715
;
1816
; AVX-LABEL: combine_packss_v4i32_signsplat:
1917
; AVX: # %bb.0:
2018
; AVX-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
2119
; AVX-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
2220
; AVX-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
23-
; AVX-NEXT: vpsraw $15, %xmm0, %xmm0
2421
; AVX-NEXT: retq
2522
%cmp = icmp sgt <4 x i32> %a0, %a1
2623
%ext = sext <4 x i1> %cmp to <4 x i32>

0 commit comments

Comments
 (0)