Skip to content

Commit 8d48d69

Browse files
authored
[X86] canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::INSERTPS handling (#161234)
X86ISD::INSERTPS shuffles can't create undef/poison itself, allowing us to fold freeze(insertps(x,y,i)) -> insertps(freeze(x),freeze(y),i)
1 parent f735250 commit 8d48d69

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45169,6 +45169,7 @@ bool X86TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode(
4516945169
case X86ISD::Wrapper:
4517045170
case X86ISD::WrapperRIP:
4517145171
return true;
45172+
case X86ISD::INSERTPS:
4517245173
case X86ISD::BLENDI:
4517345174
case X86ISD::PSHUFB:
4517445175
case X86ISD::PSHUFD:
@@ -45239,6 +45240,7 @@ bool X86TargetLowering::canCreateUndefOrPoisonForTargetNode(
4523945240
case X86ISD::BLENDV:
4524045241
return false;
4524145242
// SSE target shuffles.
45243+
case X86ISD::INSERTPS:
4524245244
case X86ISD::PSHUFB:
4524345245
case X86ISD::PSHUFD:
4524445246
case X86ISD::UNPCKL:

llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,12 @@ define <4 x i32> @combine_blend_of_permutes_v4i32(<2 x i64> %a0, <2 x i64> %a1)
6262
define <4 x float> @freeze_insertps(<4 x float> %a0, <4 x float> %a1) {
6363
; SSE-LABEL: freeze_insertps:
6464
; SSE: # %bb.0:
65-
; SSE-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
66-
; SSE-NEXT: insertps {{.*#+}} xmm1 = xmm0[1],xmm1[1,2,3]
6765
; SSE-NEXT: movaps %xmm1, %xmm0
6866
; SSE-NEXT: retq
6967
;
7068
; AVX-LABEL: freeze_insertps:
7169
; AVX: # %bb.0:
72-
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
73-
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[1],xmm1[1,2,3]
70+
; AVX-NEXT: vmovaps %xmm1, %xmm0
7471
; AVX-NEXT: retq
7572
%s0 = call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %a0, <4 x float> %a1, i8 16)
7673
%f0 = freeze <4 x float> %s0

0 commit comments

Comments
 (0)