-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU] expand-fp: Change frem expansion criterion #158285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AMDGPU] expand-fp: Change frem expansion criterion #158285
Conversation
The existing condition for checking whether or not to expand an frem instruction in the pass is not sufficiently precise. Right now, it is sufficient to ensure the correct working of the pass. But this is only true in conjunction with the existing check for the MaxLegalFpConvertBitWidth value which happens to exit early on targets on which the frem condition is insufficient. The correct working of the pass should not rely on this interaction. The possibility of using the pass for handling further expansions:(e.g. merging the very similar ExpandLargDivRem into it) is also limited by this. This patch changes the pass to expand frem for a target iff the target's legalization action for the instruction with the scalar type corresponding to the instruction type is LibCall but the libcall does not exist. The legalization action for frem in the AMDGPU backend is adjusted accordingly.
@llvm/pr-subscribers-backend-amdgpu Author: Frederik Harwath (frederik-h) ChangesThe existing condition for checking whether or not to expand an frem instruction in the pass is not sufficiently precise. Right now, it is sufficient to ensure the correct working of the pass. But this is only true in conjunction with the existing check for the MaxLegalFpConvertBitWidth value which happens to exit early on targets on which the frem condition is insufficient. The correct working of the pass should not rely on this interaction. The possibility of using the pass for handling further expansions (e.g. merging the very similar ExpandLargeDivRem into it) is also limited by this. This patch changes the pass to expand frem for a target iff the target's legalization action for the instruction with the scalar type corresponding to the instruction type is LibCall but the libcall does not exist. The legalization action for frem in the AMDGPU backend is adjusted accordingly. Full diff: https://github.com/llvm/llvm-project/pull/158285.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/ExpandFp.cpp b/llvm/lib/CodeGen/ExpandFp.cpp
index 9cc6c6a706c58..6f4f049cc7f8e 100644
--- a/llvm/lib/CodeGen/ExpandFp.cpp
+++ b/llvm/lib/CodeGen/ExpandFp.cpp
@@ -979,14 +979,22 @@ static RTLIB::Libcall fremToLibcall(Type *Ty) {
llvm_unreachable("Unknown floating point type");
}
-/* Return true if, according to \p LibInfo, the target either directly
- supports the frem instruction for the \p Ty, has a custom lowering,
- or uses a libcall. */
-static bool targetSupportsFrem(const TargetLowering &TLI, Type *Ty) {
- if (!TLI.isOperationExpand(ISD::FREM, EVT::getEVT(Ty)))
- return true;
-
- return TLI.getLibcallName(fremToLibcall(Ty->getScalarType()));
+/// Return true if the pass should expand a "frem" instruction of the
+/// given \p Ty for the target represented by \p TLI. Expansion
+/// should happen if the legalization for the scalar type uses a
+/// non-existing libcall. The scalar type is considered because it is
+/// easier to do so and it is highly unlikely that a vector type can
+/// be legalized without a libcall if the scalar type cannot.
+static bool shouldExpandFremType(const TargetLowering &TLI, Type *Ty) {
+ Type *ScalarTy = Ty->getScalarType();
+ EVT VT = EVT::getEVT(ScalarTy);
+
+ TargetLowering::LegalizeAction LA = TLI.getOperationAction(ISD::FREM, VT);
+ if (LA != TargetLowering::LegalizeAction::LibCall)
+ return false;
+
+ bool MissingLibcall = !TLI.getLibcallName(fremToLibcall(ScalarTy));
+ return MissingLibcall && FRemExpander::canExpandType(ScalarTy);
}
static bool runImpl(Function &F, const TargetLowering &TLI,
@@ -1000,8 +1008,8 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
if (ExpandFpConvertBits != llvm::IntegerType::MAX_INT_BITS)
MaxLegalFpConvertBitWidth = ExpandFpConvertBits;
- if (MaxLegalFpConvertBitWidth >= llvm::IntegerType::MAX_INT_BITS)
- return false;
+ bool TargetSkipExpandLargeFp =
+ MaxLegalFpConvertBitWidth >= llvm::IntegerType::MAX_INT_BITS;
for (auto &I : instructions(F)) {
switch (I.getOpcode()) {
@@ -1011,8 +1019,7 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
if (Ty->isScalableTy())
continue;
- if (targetSupportsFrem(TLI, Ty) ||
- !FRemExpander::canExpandType(Ty->getScalarType()))
+ if (!shouldExpandFremType(TLI, Ty))
continue;
Replace.push_back(&I);
@@ -1022,6 +1029,9 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
}
case Instruction::FPToUI:
case Instruction::FPToSI: {
+ if (TargetSkipExpandLargeFp)
+ continue;
+
// TODO: This pass doesn't handle scalable vectors.
if (I.getOperand(0)->getType()->isScalableTy())
continue;
@@ -1039,6 +1049,9 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
}
case Instruction::UIToFP:
case Instruction::SIToFP: {
+ if (TargetSkipExpandLargeFp)
+ continue;
+
// TODO: This pass doesn't handle scalable vectors.
if (I.getOperand(0)->getType()->isScalableTy())
continue;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 5c9b616e9bc21..3892d7949a0fc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -423,7 +423,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::LRINT, ISD::LLRINT}, {MVT::f16, MVT::f32, MVT::f64},
Expand);
- setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, Expand);
+ setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, LibCall);
if (Subtarget->has16BitInsts()) {
setOperationAction(ISD::IS_FPCLASS, {MVT::f16, MVT::f32, MVT::f64}, Legal);
|
… expansion This commit adds a function to check if a target needs expansion of any type of frem instructions. In conjunction with the trivial check if the large fp conversion expansions are necessary, this can be used to perform any early exit from the pass if no expansions are needed for a target.
* Use constexpr std::array instead of inline const SmallVector. * shouldExpandFremType - Remove LibCall action handling - Add assert to document that vectors are not handled - additionally: Inline variable
✅ With the latest revision this PR passed the C/C++ code formatter. |
The compiler in the Windows build failed to deduced the type in the auto.
llvm/lib/CodeGen/ExpandFp.cpp
Outdated
auto Libcall = getFremLibcallForType(VT); | ||
return Libcall.has_value() && !TLI.getLibcallName(*Libcall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass shouldn't really need to consider the libcall name. Can you fix codegen to use the libcall action for frem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass shouldn't really need to consider the libcall name. Can you fix codegen to use the libcall action for frem
Here you wrote:
Expand is the correct action. Every other target should be using LibCall. I.e. change the default action to Libcall. This is another stalled migration, Expand should imply emit standalone code that doesn't depend on a function
That is, you suggest that I change the legalization action for ISD::FREM
for every target that currently uses Expand
to LibCall
(unless it provides a custom lowering function, I suppose; in this case Custom
, as e.g. used by the NVPTX target, would be correct?) and this pass would simply start handling frem for the types that it supports if the target uses Expand
for frem instructions of this type. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, you suggest that I change the legalization action for ISD::FREM for every target that currently uses Expand to LibCall (unless it provides a custom lowering function, I suppose; in this case Custom, as e.g. used by the NVPTX target, would be correct?)
Yes. However, for the PTX case, that is the same bug. The custom expansion is the same buggy expansion you're fixing. As a hack until that's also converted, you could just ignore custom too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the change now. I think this should go into a separate PR if we want to keep it this way as it affects many targets. I also think it is a bit confusing to make this "Expand" -> "LibCall" change for ISD::FREM
only. Several times, I had to exclude it from a list of instructions which should probably all be handled in the same way.
The existing condition for checking whether or not to expand an frem instruction in the pass is not sufficiently precise. Right now, it is sufficient to ensure the correct working of the pass for the targets currently using the pass. But this is only true in conjunction with the existing check for the MaxLegalFpConvertBitWidth value which happens to exit early on targets on which the frem condition is insufficient.
The correct working of the pass should not rely on this interaction. The possibility of using the pass for handling further expansions (e.g. merging the very similar ExpandLargeDivRem into it) is also limited by this.
Make the expansion criterion more precise and use it to exit early from the pass run if no expansions are required for a target.