Skip to content

Conversation

frederik-h
Copy link
Contributor

@frederik-h frederik-h commented Sep 12, 2025

The existing condition for checking whether or not to expand an frem instruction in the pass is not sufficiently precise. Right now, it is sufficient to ensure the correct working of the pass for the targets currently using the pass. But this is only true in conjunction with the existing check for the MaxLegalFpConvertBitWidth value which happens to exit early on targets on which the frem condition is insufficient.

The correct working of the pass should not rely on this interaction. The possibility of using the pass for handling further expansions (e.g. merging the very similar ExpandLargeDivRem into it) is also limited by this.

Make the expansion criterion more precise and use it to exit early from the pass run if no expansions are required for a target.

The existing condition for checking whether or not to expand an frem
instruction in the pass is not sufficiently precise.  Right now, it is
sufficient to ensure the correct working of the pass. But this is only
true in conjunction with the existing check for the
MaxLegalFpConvertBitWidth value which happens to exit early on targets
on which the frem condition is insufficient.

The correct working of the pass should not rely on this interaction.
The possibility of using the pass for handling further
expansions:(e.g. merging the very similar ExpandLargDivRem into it) is
also limited by this.

This patch changes the pass to expand frem for a target iff the
target's legalization action for the instruction with the scalar type
corresponding to the instruction type is LibCall but the libcall does
not exist. The legalization action for frem in the AMDGPU backend is
adjusted accordingly.
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

The existing condition for checking whether or not to expand an frem instruction in the pass is not sufficiently precise. Right now, it is sufficient to ensure the correct working of the pass. But this is only true in conjunction with the existing check for the MaxLegalFpConvertBitWidth value which happens to exit early on targets on which the frem condition is insufficient.

The correct working of the pass should not rely on this interaction. The possibility of using the pass for handling further expansions (e.g. merging the very similar ExpandLargeDivRem into it) is also limited by this.

This patch changes the pass to expand frem for a target iff the target's legalization action for the instruction with the scalar type corresponding to the instruction type is LibCall but the libcall does not exist. The legalization action for frem in the AMDGPU backend is adjusted accordingly.


Full diff: https://github.com/llvm/llvm-project/pull/158285.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandFp.cpp (+25-12)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+1-1)
diff --git a/llvm/lib/CodeGen/ExpandFp.cpp b/llvm/lib/CodeGen/ExpandFp.cpp
index 9cc6c6a706c58..6f4f049cc7f8e 100644
--- a/llvm/lib/CodeGen/ExpandFp.cpp
+++ b/llvm/lib/CodeGen/ExpandFp.cpp
@@ -979,14 +979,22 @@ static RTLIB::Libcall fremToLibcall(Type *Ty) {
   llvm_unreachable("Unknown floating point type");
 }
 
-/* Return true if, according to \p LibInfo, the target either directly
-   supports the frem instruction for the \p Ty, has a custom lowering,
-   or uses a libcall. */
-static bool targetSupportsFrem(const TargetLowering &TLI, Type *Ty) {
-  if (!TLI.isOperationExpand(ISD::FREM, EVT::getEVT(Ty)))
-    return true;
-
-  return TLI.getLibcallName(fremToLibcall(Ty->getScalarType()));
+/// Return true if the pass should expand a "frem" instruction of the
+/// given \p Ty for the target represented by \p TLI. Expansion
+/// should happen if the legalization for the scalar type uses a
+/// non-existing libcall. The scalar type is considered because it is
+/// easier to do so and it is highly unlikely that a vector type can
+/// be legalized without a libcall if the scalar type cannot.
+static bool shouldExpandFremType(const TargetLowering &TLI, Type *Ty) {
+  Type *ScalarTy = Ty->getScalarType();
+  EVT VT = EVT::getEVT(ScalarTy);
+
+  TargetLowering::LegalizeAction LA = TLI.getOperationAction(ISD::FREM, VT);
+  if (LA != TargetLowering::LegalizeAction::LibCall)
+    return false;
+
+  bool MissingLibcall = !TLI.getLibcallName(fremToLibcall(ScalarTy));
+  return MissingLibcall && FRemExpander::canExpandType(ScalarTy);
 }
 
 static bool runImpl(Function &F, const TargetLowering &TLI,
@@ -1000,8 +1008,8 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
   if (ExpandFpConvertBits != llvm::IntegerType::MAX_INT_BITS)
     MaxLegalFpConvertBitWidth = ExpandFpConvertBits;
 
-  if (MaxLegalFpConvertBitWidth >= llvm::IntegerType::MAX_INT_BITS)
-    return false;
+  bool TargetSkipExpandLargeFp =
+      MaxLegalFpConvertBitWidth >= llvm::IntegerType::MAX_INT_BITS;
 
   for (auto &I : instructions(F)) {
     switch (I.getOpcode()) {
@@ -1011,8 +1019,7 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
       if (Ty->isScalableTy())
         continue;
 
-      if (targetSupportsFrem(TLI, Ty) ||
-          !FRemExpander::canExpandType(Ty->getScalarType()))
+      if (!shouldExpandFremType(TLI, Ty))
         continue;
 
       Replace.push_back(&I);
@@ -1022,6 +1029,9 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
     }
     case Instruction::FPToUI:
     case Instruction::FPToSI: {
+      if (TargetSkipExpandLargeFp)
+        continue;
+
       // TODO: This pass doesn't handle scalable vectors.
       if (I.getOperand(0)->getType()->isScalableTy())
         continue;
@@ -1039,6 +1049,9 @@ static bool runImpl(Function &F, const TargetLowering &TLI,
     }
     case Instruction::UIToFP:
     case Instruction::SIToFP: {
+      if (TargetSkipExpandLargeFp)
+        continue;
+
       // TODO: This pass doesn't handle scalable vectors.
       if (I.getOperand(0)->getType()->isScalableTy())
         continue;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 5c9b616e9bc21..3892d7949a0fc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -423,7 +423,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   setOperationAction({ISD::LRINT, ISD::LLRINT}, {MVT::f16, MVT::f32, MVT::f64},
                      Expand);
 
-  setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, Expand);
+  setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, LibCall);
 
   if (Subtarget->has16BitInsts()) {
     setOperationAction(ISD::IS_FPCLASS, {MVT::f16, MVT::f32, MVT::f64}, Legal);

… expansion

This commit adds a function to check if a target needs expansion
of any type of frem instructions. In conjunction with the trivial
check if the large fp conversion expansions are necessary, this
can be used to perform any early exit from the pass if no
expansions are needed for a target.
@frederik-h frederik-h requested a review from arsenm September 16, 2025 09:55
* Use constexpr std::array instead of inline const SmallVector.
* shouldExpandFremType
  - Remove LibCall action handling
  - Add assert to document that vectors are not handled
  - additionally: Inline variable
Copy link

github-actions bot commented Sep 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

The compiler in the Windows build failed to deduced the type in the auto.
Comment on lines 118 to 119
auto Libcall = getFremLibcallForType(VT);
return Libcall.has_value() && !TLI.getLibcallName(*Libcall);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass shouldn't really need to consider the libcall name. Can you fix codegen to use the libcall action for frem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass shouldn't really need to consider the libcall name. Can you fix codegen to use the libcall action for frem

Here you wrote:

Expand is the correct action. Every other target should be using LibCall. I.e. change the default action to Libcall. This is another stalled migration, Expand should imply emit standalone code that doesn't depend on a function

That is, you suggest that I change the legalization action for ISD::FREM for every target that currently uses Expand to LibCall (unless it provides a custom lowering function, I suppose; in this case Custom, as e.g. used by the NVPTX target, would be correct?) and this pass would simply start handling frem for the types that it supports if the target uses Expand for frem instructions of this type. Is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, you suggest that I change the legalization action for ISD::FREM for every target that currently uses Expand to LibCall (unless it provides a custom lowering function, I suppose; in this case Custom, as e.g. used by the NVPTX target, would be correct?)

Yes. However, for the PTX case, that is the same bug. The custom expansion is the same buggy expansion you're fixing. As a hack until that's also converted, you could just ignore custom too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the change now. I think this should go into a separate PR if we want to keep it this way as it affects many targets. I also think it is a bit confusing to make this "Expand" -> "LibCall" change for ISD::FREM only. Several times, I had to exclude it from a list of instructions which should probably all be handled in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants