Skip to content

Commit 833b904

Browse files
committed
AMDGPU: Custom lower vector fptrunc of f32 -> f16
GFx950+ supports v_cvt_pk_f16_f32. However current implementation of vector fptrunc lowering fully scalarizes the vector, and the scalar conversions may not always be combined to generate the packed one. We made v2f32 -> v2f16 legal in #139956. This work is an extension to handle wider vectors. Instead of fully scalarization, we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always been generated. NOTE: minor changes
1 parent 4ecace6 commit 833b904

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6906,7 +6906,7 @@ SDValue SITargetLowering::splitFP_ROUNDVectorOp(SDValue Op,
69066906
SelectionDAG &DAG) const {
69076907
EVT DstVT = Op.getValueType();
69086908
unsigned NumElts = DstVT.getVectorNumElements();
6909-
assert(isPowerOf2_32(NumElts) && "Number of elements must be power of 2");
6909+
assert(NumElts > 2 && isPowerOf2_32(NumElts));
69106910

69116911
auto [Lo, Hi] = DAG.SplitVectorOperand(Op.getNode(), 0);
69126912

@@ -6930,7 +6930,7 @@ SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
69306930
assert(Subtarget->hasCvtPkF16F32Inst() && "support v_cvt_pk_f16_f32");
69316931
if (SrcVT.getScalarType() != MVT::f32)
69326932
return SDValue();
6933-
return DstVT == MVT::v2f16 ? Op : splitFP_ROUNDVectorOp(Op, DAG);
6933+
return SrcVT == MVT::v2f32 ? Op : splitFP_ROUNDVectorOp(Op, DAG);
69346934
}
69356935

69366936
if (SrcVT.getScalarType() != MVT::f64)

0 commit comments

Comments
 (0)