Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7608,6 +7608,22 @@ LegalizerHelper::lowerU64ToF64BitFloatOps(MachineInstr &MI) {
return Legalized;
}

/// i64->fp16 itofp can be lowered to i64->f64,f64->f32,f32->f16. We cannot
/// convert fpround f64->f16 without double-rounding, so we manually perform the
/// lowering here where we know it is valid.
static LegalizerHelper::LegalizeResult
loweri64tof16ITOFP(MachineInstr &MI, Register Dst, LLT DstTy, Register Src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this under lower and not narrow scalar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was wondering which was better. It felt more like a lower than a generic narrow. If a target wanted a single fptrunc (because they had an instruction for it), then they wouldn't want it to be split. So they would opt for the narrow. As far as I understand the narrows don't make extra legalization queries, there isn't a way to communicate two different types to them, and the alternative of a target hook doesn't sound the best. Maybe I'm missing something about how this should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The promote has a type parameter that tells the new type to use. I don't see what information is missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR: We are trying to lower a vNi64->vNf16 uitofp, without scalarizing. For example a v4i64->v4f16 uitofp turns into

ucvtf v0.2d, v0.2d
ucvtf v1.2d, v1.2d
fcvtn v0.2s, v0.2d
fcvtn2 v0.4s, v1.2d
fcvtn v0.4h, v0.4s

So a v4f16 fptrunc(v4f32 fptrunc(v4f64 uitofp(v4i64))) (where the v4i64 is two vectors). If we initially say "widen type0 to f64" we get v4f16 fptrunc(v4f64 uitofp(v4i64)) and the v4f16 fptrunc(v4f64 needs to scalarize. If we say "widen type0 to f32" we get v4f16 fptrunc(v4f32 uitofp(v4i64)) and the v4f32 uitofp(v4i64 needs to scalarize (the point of this patch). That is what I meant by "there isn't a way to communicate two different types", you can't give it "widen to f64 via f32 please".

Note: for scalar we want it to become f16 fptrunc(f64 uitofp(i64)) because we have an instruction for f64->f16 fptrunc. We don't want it to become two fptruncs.

(Whilst thinking about this, I wondered if we could lower to uqxtn+uqxtn2+scvtf+fcvtn for fp16 instead, but that would involve a TRUNCATE_USAT_U that we don't have for globalisel yet).

Currently we have it that lower means legalize via two steps, widen means lower directly, which might not be the best but fits into "lower does something different". What was the concrete suggestion for how this should work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer is "Check that the G_FPTRUNC is not legal in widenScalar for G_UITOFP" then I can give that a go, we just don't call other legalization actions in most of LegalizerHelper at the moment. The vector types will need to split even if the vector types are not legal (wider vectors etc).

LLT SrcTy, MachineIRBuilder &MIRBuilder) {
auto M1 = MI.getOpcode() == TargetOpcode::G_UITOFP
? MIRBuilder.buildUITOFP(SrcTy, Src)
: MIRBuilder.buildSITOFP(SrcTy, Src);
LLT S32Ty = SrcTy.changeElementSize(32);
auto M2 = MIRBuilder.buildFPTrunc(S32Ty, M1);
MIRBuilder.buildFPTrunc(Dst, M2);
MI.eraseFromParent();
return LegalizerHelper::Legalized;
}

LegalizerHelper::LegalizeResult LegalizerHelper::lowerUITOFP(MachineInstr &MI) {
auto [Dst, DstTy, Src, SrcTy] = MI.getFirst2RegLLTs();

Expand All @@ -7619,6 +7635,9 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerUITOFP(MachineInstr &MI) {
return Legalized;
}

if (DstTy.getScalarSizeInBits() == 16 && SrcTy.getScalarSizeInBits() == 64)
return loweri64tof16ITOFP(MI, Dst, DstTy, Src, SrcTy, MIRBuilder);

if (SrcTy != LLT::scalar(64))
return UnableToLegalize;

Expand Down Expand Up @@ -7650,6 +7669,9 @@ LegalizerHelper::LegalizeResult LegalizerHelper::lowerSITOFP(MachineInstr &MI) {
return Legalized;
}

if (DstTy.getScalarSizeInBits() == 16 && SrcTy.getScalarSizeInBits() == 64)
return loweri64tof16ITOFP(MI, Dst, DstTy, Src, SrcTy, MIRBuilder);

if (SrcTy != S64)
return UnableToLegalize;

Expand Down
16 changes: 14 additions & 2 deletions llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -917,16 +917,28 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.moreElementsToNextPow2(1)
.widenScalarOrEltToNextPow2OrMinSize(1)
.minScalar(1, s32)
.lowerIf([](const LegalityQuery &Query) {
return Query.Types[1].isVector() &&
Query.Types[1].getScalarSizeInBits() == 64 &&
Query.Types[0].getScalarSizeInBits() == 16;
})
.widenScalarOrEltToNextPow2OrMinSize(0, /*MinSize=*/HasFP16 ? 16 : 32)
.scalarizeIf(
// v2i64->v2f32 needs to scalarize to avoid double-rounding issues.
[](const LegalityQuery &Query) {
return Query.Types[0].getScalarSizeInBits() == 32 &&
Query.Types[1].getScalarSizeInBits() == 64;
},
0)
.widenScalarIf(
[=](const LegalityQuery &Query) {
[](const LegalityQuery &Query) {
return Query.Types[1].getScalarSizeInBits() <= 64 &&
Query.Types[0].getScalarSizeInBits() <
Query.Types[1].getScalarSizeInBits();
},
LegalizeMutations::changeElementSizeTo(0, 1))
.widenScalarIf(
[=](const LegalityQuery &Query) {
[](const LegalityQuery &Query) {
return Query.Types[0].getScalarSizeInBits() <= 64 &&
Query.Types[0].getScalarSizeInBits() >
Query.Types[1].getScalarSizeInBits();
Expand Down
Loading
Loading