Skip to content

Commit d770567

Browse files
authored
[X86] SimplifyDemandedVectorEltsForTargetNode - don't split X86ISD::CVTTP2UI nodes without AVX512VL (#154504)
Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we don't fallback to the AVX1 variant when we split a 512-bit vector, so we can only use the 128/256-bit variants if we have AVX512VL. Fixes #154492
1 parent dc23869 commit d770567

File tree

2 files changed

+25
-1
lines changed

2 files changed

+25
-1
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44195,8 +44195,12 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
4419544195
}
4419644196
// Conversions.
4419744197
// TODO: Add more CVT opcodes when we have test coverage.
44198-
case X86ISD::CVTTP2SI:
4419944198
case X86ISD::CVTTP2UI: {
44199+
if (!Subtarget.hasVLX())
44200+
break;
44201+
[[fallthrough]];
44202+
}
44203+
case X86ISD::CVTTP2SI: {
4420044204
if (Op.getOperand(0).getValueType().getVectorElementType() == MVT::f16 &&
4420144205
!Subtarget.hasVLX())
4420244206
break;

llvm/test/CodeGen/X86/pr154492.ll

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512f | FileCheck %s --check-prefix=AVX512F
3+
; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512vl | FileCheck %s --check-prefix=AVX512VL
4+
5+
define <16 x i32> @PR154492() {
6+
; AVX512F-LABEL: PR154492:
7+
; AVX512F: # %bb.0:
8+
; AVX512F-NEXT: vxorps %xmm0, %xmm0, %xmm0
9+
; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0
10+
; AVX512F-NEXT: vmovaps %ymm0, %ymm0
11+
; AVX512F-NEXT: retq
12+
;
13+
; AVX512VL-LABEL: PR154492:
14+
; AVX512VL: # %bb.0:
15+
; AVX512VL-NEXT: vxorps %xmm0, %xmm0, %xmm0
16+
; AVX512VL-NEXT: vcvttps2udq %ymm0, %ymm0
17+
; AVX512VL-NEXT: retq
18+
%res = call <16 x i32> @llvm.x86.avx512.mask.cvttps2udq.512(<16 x float> zeroinitializer, <16 x i32> zeroinitializer, i16 255, i32 4)
19+
ret <16 x i32> %res
20+
}

0 commit comments

Comments
 (0)