-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[RFC][llvm] Added llvm.loop.vectorize.reassociate_fpreductions.enable metadata. #141685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC][llvm] Added llvm.loop.vectorize.reassociate_fpreductions.enable metadata. #141685
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Slava Zakharin (vzakhari) ChangesThis metadata allows unsafe reassociations of computations during RFC: https://discourse.llvm.org/t/llvm-proposing-llvm-loop-vectorize-reassociation-enable-metadata/86573 Full diff: https://github.com/llvm/llvm-project/pull/141685.diff 5 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8c0a046d3a7e9..5d4ef30ae7e3e 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7593,6 +7593,24 @@ Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
then the interleave count will be determined automatically.
+'``llvm.loop.vectorize.reassociate_fpreductions.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata selectively allows or disallows reassociating floating-point
+reductions, which otherwise may be unsafe to reassociate, during the loop
+vectorization. For example, a floating point ``ADD`` reduction without
+``reassoc`` fast-math flags may be vectorized provided that this metadata
+allows it. The first operand is the string
+``llvm.loop.vectorize.reassociate_fpreductions.enable``
+and the second operand is a bit. If the bit operand value is 1 unsafe
+reduction reassociations are enabled. A value of 0 disables unsafe
+reduction reassociations.
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 0}
+ !1 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 1}
+
'``llvm.loop.vectorize.enable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index d654ac3ec9273..5911501ca2d3e 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -64,7 +64,8 @@ class LoopVectorizeHints {
HK_FORCE,
HK_ISVECTORIZED,
HK_PREDICATE,
- HK_SCALABLE
+ HK_SCALABLE,
+ HK_REASSOCIATE_FP_REDUCTIONS,
};
/// Hint - associates name and validation with the hint value.
@@ -97,6 +98,10 @@ class LoopVectorizeHints {
/// Says whether we should use fixed width or scalable vectorization.
Hint Scalable;
+ /// Says whether unsafe reassociation of reductions is allowed
+ /// during the loop vectorization.
+ Hint ReassociateFPReductions;
+
/// Return the loop metadata prefix.
static StringRef Prefix() { return "llvm.loop."; }
@@ -162,6 +167,13 @@ class LoopVectorizeHints {
return (ScalableForceKind)Scalable.Value == SK_FixedWidthOnly;
}
+ enum ForceKind getReassociateFPReductions() const {
+ if ((ForceKind)ReassociateFPReductions.Value == FK_Undefined &&
+ hasDisableAllTransformsHint(TheLoop))
+ return FK_Disabled;
+ return (ForceKind)ReassociateFPReductions.Value;
+ }
+
/// If hints are provided that force vectorization, use the AlwaysPrint
/// pass name to force the frontend to print the diagnostic.
const char *vectorizeAnalysisPassName() const;
@@ -173,6 +185,10 @@ class LoopVectorizeHints {
/// error accumulates in the loop.
bool allowReordering() const;
+ /// Returns true iff the loop hints allow reassociating floating-point
+ /// reductions for the purpose of vectorization.
+ bool allowFPReductionReassociation() const;
+
bool isPotentiallyUnsafe() const {
// Avoid FP vectorization if the target is unsure about proper support.
// This may be related to the SIMD unit in the target not handling
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 8e09e6f8d4935..dffff6f7278a1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -97,6 +97,7 @@ bool LoopVectorizeHints::Hint::validate(unsigned Val) {
case HK_ISVECTORIZED:
case HK_PREDICATE:
case HK_SCALABLE:
+ case HK_REASSOCIATE_FP_REDUCTIONS:
return (Val == 0 || Val == 1);
}
return false;
@@ -112,6 +113,8 @@ LoopVectorizeHints::LoopVectorizeHints(const Loop *L,
IsVectorized("isvectorized", 0, HK_ISVECTORIZED),
Predicate("vectorize.predicate.enable", FK_Undefined, HK_PREDICATE),
Scalable("vectorize.scalable.enable", SK_Unspecified, HK_SCALABLE),
+ ReassociateFPReductions("vectorize.reassociate_fpreductions.enable",
+ FK_Undefined, HK_REASSOCIATE_FP_REDUCTIONS),
TheLoop(L), ORE(ORE) {
// Populate values with existing loop metadata.
getHintsFromMetadata();
@@ -254,6 +257,11 @@ bool LoopVectorizeHints::allowReordering() const {
EC.getKnownMinValue() > 1);
}
+bool LoopVectorizeHints::allowFPReductionReassociation() const {
+ return HintsAllowReordering &&
+ getReassociateFPReductions() == LoopVectorizeHints::FK_Enabled;
+}
+
void LoopVectorizeHints::getHintsFromMetadata() {
MDNode *LoopID = TheLoop->getLoopID();
if (!LoopID)
@@ -300,8 +308,13 @@ void LoopVectorizeHints::setHint(StringRef Name, Metadata *Arg) {
return;
unsigned Val = C->getZExtValue();
- Hint *Hints[] = {&Width, &Interleave, &Force,
- &IsVectorized, &Predicate, &Scalable};
+ Hint *Hints[] = {&Width,
+ &Interleave,
+ &Force,
+ &IsVectorized,
+ &Predicate,
+ &Scalable,
+ &ReassociateFPReductions};
for (auto *H : Hints) {
if (Name == H->Name) {
if (H->validate(Val))
@@ -1311,22 +1324,25 @@ bool LoopVectorizationLegality::canVectorizeFPMath(
return true;
// If the above is false, we have ExactFPMath & do not allow reordering.
- // If the EnableStrictReductions flag is set, first check if we have any
- // Exact FP induction vars, which we cannot vectorize.
- if (!EnableStrictReductions ||
- any_of(getInductionVars(), [&](auto &Induction) -> bool {
+ // First check if we have any Exact FP induction vars, which we cannot
+ // vectorize.
+ if (any_of(getInductionVars(), [&](auto &Induction) -> bool {
InductionDescriptor IndDesc = Induction.second;
return IndDesc.getExactFPMathInst();
}))
return false;
- // We can now only vectorize if all reductions with Exact FP math also
- // have the isOrdered flag set, which indicates that we can move the
- // reduction operations in-loop.
- return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
- const RecurrenceDescriptor &RdxDesc = Reduction.second;
- return !RdxDesc.hasExactFPMath() || RdxDesc.isOrdered();
- }));
+ // We can now only vectorize if EnableStrictReductions flag is set and
+ // all reductions with Exact FP math also have the isOrdered flag set,
+ // which indicates that we can move the reduction operations in-loop.
+ // If the hints allow reassociating FP reductions, then skip
+ // all the checks.
+ return (Hints->allowFPReductionReassociation() ||
+ all_of(getReductionVars(), [&](auto &Reduction) -> bool {
+ const RecurrenceDescriptor &RdxDesc = Reduction.second;
+ return !RdxDesc.hasExactFPMath() ||
+ (EnableStrictReductions && RdxDesc.isOrdered());
+ }));
}
bool LoopVectorizationLegality::isInvariantStoreOfReduction(StoreInst *SI) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2fe59a464457f..c64caa92d9290 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1019,9 +1019,10 @@ class LoopVectorizationCostModel {
/// Returns true if we should use strict in-order reductions for the given
/// RdxDesc. This is true if the -enable-strict-reductions flag is passed,
/// the IsOrdered flag of RdxDesc is set and we do not allow reordering
- /// of FP operations.
+ /// of FP operations or FP reductions.
bool useOrderedReductions(const RecurrenceDescriptor &RdxDesc) const {
- return !Hints->allowReordering() && RdxDesc.isOrdered();
+ return !Hints->allowReordering() &&
+ !Hints->allowFPReductionReassociation() && RdxDesc.isOrdered();
}
/// \returns The smallest bitwidth each instruction can be represented with.
diff --git a/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
new file mode 100644
index 0000000000000..08b08d2d405b6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
@@ -0,0 +1,47 @@
+; Check that the loop with a floating-point reduction is vectorized
+; due to llvm.loop.vectorize.reassociate_fpreductions.enable metadata.
+; RUN: opt -passes=loop-vectorize -S < %s 2>&1 | FileCheck %s
+
+source_filename = "FIRModule"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite)
+define void @test_(ptr captures(none) %0, ptr readonly captures(none) %1) local_unnamed_addr #0 {
+; CHECK-LABEL: define void @test_(
+; CHECK: fadd contract <4 x float> {{.*}}
+; CHECK: call contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> {{.*}})
+;
+ %invariant.gep = getelementptr i8, ptr %1, i64 -4
+ %.promoted = load float, ptr %0, align 4
+ br label %3
+
+3: ; preds = %2, %3
+ %indvars.iv = phi i64 [ 1, %2 ], [ %indvars.iv.next, %3 ]
+ %4 = phi float [ %.promoted, %2 ], [ %6, %3 ]
+ %gep = getelementptr float, ptr %invariant.gep, i64 %indvars.iv
+ %5 = load float, ptr %gep, align 4
+ %6 = fadd contract float %4, %5
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, 1001
+ br i1 %exitcond.not, label %7, label %3, !llvm.loop !2
+
+7: ; preds = %3
+ %.lcssa = phi float [ %6, %3 ]
+ store float %.lcssa, ptr %0, align 4
+ ret void
+}
+
+attributes #0 = { nofree norecurse nosync nounwind memory(argmem: readwrite) "target-cpu"="x86-64" }
+
+!llvm.ident = !{!0}
+!llvm.module.flags = !{!1}
+
+!0 = !{!"flang version 21.0.0"}
+!1 = !{i32 2, !"Debug Info Version", i32 3}
+!2 = distinct !{!2, !3}
+!3 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 true}
+
+; CHECK-NOT: llvm.loop.vectorize.reassociate_fpreductions.enable
+; CHECK: !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: !{!"llvm.loop.unroll.runtime.disable"}
|
|
@llvm/pr-subscribers-vectorizers Author: Slava Zakharin (vzakhari) ChangesThis metadata allows unsafe reassociations of computations during RFC: https://discourse.llvm.org/t/llvm-proposing-llvm-loop-vectorize-reassociation-enable-metadata/86573 Full diff: https://github.com/llvm/llvm-project/pull/141685.diff 5 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8c0a046d3a7e9..5d4ef30ae7e3e 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7593,6 +7593,24 @@ Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
then the interleave count will be determined automatically.
+'``llvm.loop.vectorize.reassociate_fpreductions.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata selectively allows or disallows reassociating floating-point
+reductions, which otherwise may be unsafe to reassociate, during the loop
+vectorization. For example, a floating point ``ADD`` reduction without
+``reassoc`` fast-math flags may be vectorized provided that this metadata
+allows it. The first operand is the string
+``llvm.loop.vectorize.reassociate_fpreductions.enable``
+and the second operand is a bit. If the bit operand value is 1 unsafe
+reduction reassociations are enabled. A value of 0 disables unsafe
+reduction reassociations.
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 0}
+ !1 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 1}
+
'``llvm.loop.vectorize.enable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index d654ac3ec9273..5911501ca2d3e 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -64,7 +64,8 @@ class LoopVectorizeHints {
HK_FORCE,
HK_ISVECTORIZED,
HK_PREDICATE,
- HK_SCALABLE
+ HK_SCALABLE,
+ HK_REASSOCIATE_FP_REDUCTIONS,
};
/// Hint - associates name and validation with the hint value.
@@ -97,6 +98,10 @@ class LoopVectorizeHints {
/// Says whether we should use fixed width or scalable vectorization.
Hint Scalable;
+ /// Says whether unsafe reassociation of reductions is allowed
+ /// during the loop vectorization.
+ Hint ReassociateFPReductions;
+
/// Return the loop metadata prefix.
static StringRef Prefix() { return "llvm.loop."; }
@@ -162,6 +167,13 @@ class LoopVectorizeHints {
return (ScalableForceKind)Scalable.Value == SK_FixedWidthOnly;
}
+ enum ForceKind getReassociateFPReductions() const {
+ if ((ForceKind)ReassociateFPReductions.Value == FK_Undefined &&
+ hasDisableAllTransformsHint(TheLoop))
+ return FK_Disabled;
+ return (ForceKind)ReassociateFPReductions.Value;
+ }
+
/// If hints are provided that force vectorization, use the AlwaysPrint
/// pass name to force the frontend to print the diagnostic.
const char *vectorizeAnalysisPassName() const;
@@ -173,6 +185,10 @@ class LoopVectorizeHints {
/// error accumulates in the loop.
bool allowReordering() const;
+ /// Returns true iff the loop hints allow reassociating floating-point
+ /// reductions for the purpose of vectorization.
+ bool allowFPReductionReassociation() const;
+
bool isPotentiallyUnsafe() const {
// Avoid FP vectorization if the target is unsure about proper support.
// This may be related to the SIMD unit in the target not handling
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 8e09e6f8d4935..dffff6f7278a1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -97,6 +97,7 @@ bool LoopVectorizeHints::Hint::validate(unsigned Val) {
case HK_ISVECTORIZED:
case HK_PREDICATE:
case HK_SCALABLE:
+ case HK_REASSOCIATE_FP_REDUCTIONS:
return (Val == 0 || Val == 1);
}
return false;
@@ -112,6 +113,8 @@ LoopVectorizeHints::LoopVectorizeHints(const Loop *L,
IsVectorized("isvectorized", 0, HK_ISVECTORIZED),
Predicate("vectorize.predicate.enable", FK_Undefined, HK_PREDICATE),
Scalable("vectorize.scalable.enable", SK_Unspecified, HK_SCALABLE),
+ ReassociateFPReductions("vectorize.reassociate_fpreductions.enable",
+ FK_Undefined, HK_REASSOCIATE_FP_REDUCTIONS),
TheLoop(L), ORE(ORE) {
// Populate values with existing loop metadata.
getHintsFromMetadata();
@@ -254,6 +257,11 @@ bool LoopVectorizeHints::allowReordering() const {
EC.getKnownMinValue() > 1);
}
+bool LoopVectorizeHints::allowFPReductionReassociation() const {
+ return HintsAllowReordering &&
+ getReassociateFPReductions() == LoopVectorizeHints::FK_Enabled;
+}
+
void LoopVectorizeHints::getHintsFromMetadata() {
MDNode *LoopID = TheLoop->getLoopID();
if (!LoopID)
@@ -300,8 +308,13 @@ void LoopVectorizeHints::setHint(StringRef Name, Metadata *Arg) {
return;
unsigned Val = C->getZExtValue();
- Hint *Hints[] = {&Width, &Interleave, &Force,
- &IsVectorized, &Predicate, &Scalable};
+ Hint *Hints[] = {&Width,
+ &Interleave,
+ &Force,
+ &IsVectorized,
+ &Predicate,
+ &Scalable,
+ &ReassociateFPReductions};
for (auto *H : Hints) {
if (Name == H->Name) {
if (H->validate(Val))
@@ -1311,22 +1324,25 @@ bool LoopVectorizationLegality::canVectorizeFPMath(
return true;
// If the above is false, we have ExactFPMath & do not allow reordering.
- // If the EnableStrictReductions flag is set, first check if we have any
- // Exact FP induction vars, which we cannot vectorize.
- if (!EnableStrictReductions ||
- any_of(getInductionVars(), [&](auto &Induction) -> bool {
+ // First check if we have any Exact FP induction vars, which we cannot
+ // vectorize.
+ if (any_of(getInductionVars(), [&](auto &Induction) -> bool {
InductionDescriptor IndDesc = Induction.second;
return IndDesc.getExactFPMathInst();
}))
return false;
- // We can now only vectorize if all reductions with Exact FP math also
- // have the isOrdered flag set, which indicates that we can move the
- // reduction operations in-loop.
- return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
- const RecurrenceDescriptor &RdxDesc = Reduction.second;
- return !RdxDesc.hasExactFPMath() || RdxDesc.isOrdered();
- }));
+ // We can now only vectorize if EnableStrictReductions flag is set and
+ // all reductions with Exact FP math also have the isOrdered flag set,
+ // which indicates that we can move the reduction operations in-loop.
+ // If the hints allow reassociating FP reductions, then skip
+ // all the checks.
+ return (Hints->allowFPReductionReassociation() ||
+ all_of(getReductionVars(), [&](auto &Reduction) -> bool {
+ const RecurrenceDescriptor &RdxDesc = Reduction.second;
+ return !RdxDesc.hasExactFPMath() ||
+ (EnableStrictReductions && RdxDesc.isOrdered());
+ }));
}
bool LoopVectorizationLegality::isInvariantStoreOfReduction(StoreInst *SI) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2fe59a464457f..c64caa92d9290 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1019,9 +1019,10 @@ class LoopVectorizationCostModel {
/// Returns true if we should use strict in-order reductions for the given
/// RdxDesc. This is true if the -enable-strict-reductions flag is passed,
/// the IsOrdered flag of RdxDesc is set and we do not allow reordering
- /// of FP operations.
+ /// of FP operations or FP reductions.
bool useOrderedReductions(const RecurrenceDescriptor &RdxDesc) const {
- return !Hints->allowReordering() && RdxDesc.isOrdered();
+ return !Hints->allowReordering() &&
+ !Hints->allowFPReductionReassociation() && RdxDesc.isOrdered();
}
/// \returns The smallest bitwidth each instruction can be represented with.
diff --git a/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
new file mode 100644
index 0000000000000..08b08d2d405b6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
@@ -0,0 +1,47 @@
+; Check that the loop with a floating-point reduction is vectorized
+; due to llvm.loop.vectorize.reassociate_fpreductions.enable metadata.
+; RUN: opt -passes=loop-vectorize -S < %s 2>&1 | FileCheck %s
+
+source_filename = "FIRModule"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite)
+define void @test_(ptr captures(none) %0, ptr readonly captures(none) %1) local_unnamed_addr #0 {
+; CHECK-LABEL: define void @test_(
+; CHECK: fadd contract <4 x float> {{.*}}
+; CHECK: call contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> {{.*}})
+;
+ %invariant.gep = getelementptr i8, ptr %1, i64 -4
+ %.promoted = load float, ptr %0, align 4
+ br label %3
+
+3: ; preds = %2, %3
+ %indvars.iv = phi i64 [ 1, %2 ], [ %indvars.iv.next, %3 ]
+ %4 = phi float [ %.promoted, %2 ], [ %6, %3 ]
+ %gep = getelementptr float, ptr %invariant.gep, i64 %indvars.iv
+ %5 = load float, ptr %gep, align 4
+ %6 = fadd contract float %4, %5
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, 1001
+ br i1 %exitcond.not, label %7, label %3, !llvm.loop !2
+
+7: ; preds = %3
+ %.lcssa = phi float [ %6, %3 ]
+ store float %.lcssa, ptr %0, align 4
+ ret void
+}
+
+attributes #0 = { nofree norecurse nosync nounwind memory(argmem: readwrite) "target-cpu"="x86-64" }
+
+!llvm.ident = !{!0}
+!llvm.module.flags = !{!1}
+
+!0 = !{!"flang version 21.0.0"}
+!1 = !{i32 2, !"Debug Info Version", i32 3}
+!2 = distinct !{!2, !3}
+!3 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 true}
+
+; CHECK-NOT: llvm.loop.vectorize.reassociate_fpreductions.enable
+; CHECK: !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: !{!"llvm.loop.unroll.runtime.disable"}
|
|
@llvm/pr-subscribers-llvm-ir Author: Slava Zakharin (vzakhari) ChangesThis metadata allows unsafe reassociations of computations during RFC: https://discourse.llvm.org/t/llvm-proposing-llvm-loop-vectorize-reassociation-enable-metadata/86573 Full diff: https://github.com/llvm/llvm-project/pull/141685.diff 5 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8c0a046d3a7e9..5d4ef30ae7e3e 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7593,6 +7593,24 @@ Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
then the interleave count will be determined automatically.
+'``llvm.loop.vectorize.reassociate_fpreductions.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata selectively allows or disallows reassociating floating-point
+reductions, which otherwise may be unsafe to reassociate, during the loop
+vectorization. For example, a floating point ``ADD`` reduction without
+``reassoc`` fast-math flags may be vectorized provided that this metadata
+allows it. The first operand is the string
+``llvm.loop.vectorize.reassociate_fpreductions.enable``
+and the second operand is a bit. If the bit operand value is 1 unsafe
+reduction reassociations are enabled. A value of 0 disables unsafe
+reduction reassociations.
+
+.. code-block:: llvm
+
+ !0 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 0}
+ !1 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 1}
+
'``llvm.loop.vectorize.enable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index d654ac3ec9273..5911501ca2d3e 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -64,7 +64,8 @@ class LoopVectorizeHints {
HK_FORCE,
HK_ISVECTORIZED,
HK_PREDICATE,
- HK_SCALABLE
+ HK_SCALABLE,
+ HK_REASSOCIATE_FP_REDUCTIONS,
};
/// Hint - associates name and validation with the hint value.
@@ -97,6 +98,10 @@ class LoopVectorizeHints {
/// Says whether we should use fixed width or scalable vectorization.
Hint Scalable;
+ /// Says whether unsafe reassociation of reductions is allowed
+ /// during the loop vectorization.
+ Hint ReassociateFPReductions;
+
/// Return the loop metadata prefix.
static StringRef Prefix() { return "llvm.loop."; }
@@ -162,6 +167,13 @@ class LoopVectorizeHints {
return (ScalableForceKind)Scalable.Value == SK_FixedWidthOnly;
}
+ enum ForceKind getReassociateFPReductions() const {
+ if ((ForceKind)ReassociateFPReductions.Value == FK_Undefined &&
+ hasDisableAllTransformsHint(TheLoop))
+ return FK_Disabled;
+ return (ForceKind)ReassociateFPReductions.Value;
+ }
+
/// If hints are provided that force vectorization, use the AlwaysPrint
/// pass name to force the frontend to print the diagnostic.
const char *vectorizeAnalysisPassName() const;
@@ -173,6 +185,10 @@ class LoopVectorizeHints {
/// error accumulates in the loop.
bool allowReordering() const;
+ /// Returns true iff the loop hints allow reassociating floating-point
+ /// reductions for the purpose of vectorization.
+ bool allowFPReductionReassociation() const;
+
bool isPotentiallyUnsafe() const {
// Avoid FP vectorization if the target is unsure about proper support.
// This may be related to the SIMD unit in the target not handling
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 8e09e6f8d4935..dffff6f7278a1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -97,6 +97,7 @@ bool LoopVectorizeHints::Hint::validate(unsigned Val) {
case HK_ISVECTORIZED:
case HK_PREDICATE:
case HK_SCALABLE:
+ case HK_REASSOCIATE_FP_REDUCTIONS:
return (Val == 0 || Val == 1);
}
return false;
@@ -112,6 +113,8 @@ LoopVectorizeHints::LoopVectorizeHints(const Loop *L,
IsVectorized("isvectorized", 0, HK_ISVECTORIZED),
Predicate("vectorize.predicate.enable", FK_Undefined, HK_PREDICATE),
Scalable("vectorize.scalable.enable", SK_Unspecified, HK_SCALABLE),
+ ReassociateFPReductions("vectorize.reassociate_fpreductions.enable",
+ FK_Undefined, HK_REASSOCIATE_FP_REDUCTIONS),
TheLoop(L), ORE(ORE) {
// Populate values with existing loop metadata.
getHintsFromMetadata();
@@ -254,6 +257,11 @@ bool LoopVectorizeHints::allowReordering() const {
EC.getKnownMinValue() > 1);
}
+bool LoopVectorizeHints::allowFPReductionReassociation() const {
+ return HintsAllowReordering &&
+ getReassociateFPReductions() == LoopVectorizeHints::FK_Enabled;
+}
+
void LoopVectorizeHints::getHintsFromMetadata() {
MDNode *LoopID = TheLoop->getLoopID();
if (!LoopID)
@@ -300,8 +308,13 @@ void LoopVectorizeHints::setHint(StringRef Name, Metadata *Arg) {
return;
unsigned Val = C->getZExtValue();
- Hint *Hints[] = {&Width, &Interleave, &Force,
- &IsVectorized, &Predicate, &Scalable};
+ Hint *Hints[] = {&Width,
+ &Interleave,
+ &Force,
+ &IsVectorized,
+ &Predicate,
+ &Scalable,
+ &ReassociateFPReductions};
for (auto *H : Hints) {
if (Name == H->Name) {
if (H->validate(Val))
@@ -1311,22 +1324,25 @@ bool LoopVectorizationLegality::canVectorizeFPMath(
return true;
// If the above is false, we have ExactFPMath & do not allow reordering.
- // If the EnableStrictReductions flag is set, first check if we have any
- // Exact FP induction vars, which we cannot vectorize.
- if (!EnableStrictReductions ||
- any_of(getInductionVars(), [&](auto &Induction) -> bool {
+ // First check if we have any Exact FP induction vars, which we cannot
+ // vectorize.
+ if (any_of(getInductionVars(), [&](auto &Induction) -> bool {
InductionDescriptor IndDesc = Induction.second;
return IndDesc.getExactFPMathInst();
}))
return false;
- // We can now only vectorize if all reductions with Exact FP math also
- // have the isOrdered flag set, which indicates that we can move the
- // reduction operations in-loop.
- return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
- const RecurrenceDescriptor &RdxDesc = Reduction.second;
- return !RdxDesc.hasExactFPMath() || RdxDesc.isOrdered();
- }));
+ // We can now only vectorize if EnableStrictReductions flag is set and
+ // all reductions with Exact FP math also have the isOrdered flag set,
+ // which indicates that we can move the reduction operations in-loop.
+ // If the hints allow reassociating FP reductions, then skip
+ // all the checks.
+ return (Hints->allowFPReductionReassociation() ||
+ all_of(getReductionVars(), [&](auto &Reduction) -> bool {
+ const RecurrenceDescriptor &RdxDesc = Reduction.second;
+ return !RdxDesc.hasExactFPMath() ||
+ (EnableStrictReductions && RdxDesc.isOrdered());
+ }));
}
bool LoopVectorizationLegality::isInvariantStoreOfReduction(StoreInst *SI) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2fe59a464457f..c64caa92d9290 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1019,9 +1019,10 @@ class LoopVectorizationCostModel {
/// Returns true if we should use strict in-order reductions for the given
/// RdxDesc. This is true if the -enable-strict-reductions flag is passed,
/// the IsOrdered flag of RdxDesc is set and we do not allow reordering
- /// of FP operations.
+ /// of FP operations or FP reductions.
bool useOrderedReductions(const RecurrenceDescriptor &RdxDesc) const {
- return !Hints->allowReordering() && RdxDesc.isOrdered();
+ return !Hints->allowReordering() &&
+ !Hints->allowFPReductionReassociation() && RdxDesc.isOrdered();
}
/// \returns The smallest bitwidth each instruction can be represented with.
diff --git a/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
new file mode 100644
index 0000000000000..08b08d2d405b6
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/reduction-reassociate.ll
@@ -0,0 +1,47 @@
+; Check that the loop with a floating-point reduction is vectorized
+; due to llvm.loop.vectorize.reassociate_fpreductions.enable metadata.
+; RUN: opt -passes=loop-vectorize -S < %s 2>&1 | FileCheck %s
+
+source_filename = "FIRModule"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite)
+define void @test_(ptr captures(none) %0, ptr readonly captures(none) %1) local_unnamed_addr #0 {
+; CHECK-LABEL: define void @test_(
+; CHECK: fadd contract <4 x float> {{.*}}
+; CHECK: call contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> {{.*}})
+;
+ %invariant.gep = getelementptr i8, ptr %1, i64 -4
+ %.promoted = load float, ptr %0, align 4
+ br label %3
+
+3: ; preds = %2, %3
+ %indvars.iv = phi i64 [ 1, %2 ], [ %indvars.iv.next, %3 ]
+ %4 = phi float [ %.promoted, %2 ], [ %6, %3 ]
+ %gep = getelementptr float, ptr %invariant.gep, i64 %indvars.iv
+ %5 = load float, ptr %gep, align 4
+ %6 = fadd contract float %4, %5
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, 1001
+ br i1 %exitcond.not, label %7, label %3, !llvm.loop !2
+
+7: ; preds = %3
+ %.lcssa = phi float [ %6, %3 ]
+ store float %.lcssa, ptr %0, align 4
+ ret void
+}
+
+attributes #0 = { nofree norecurse nosync nounwind memory(argmem: readwrite) "target-cpu"="x86-64" }
+
+!llvm.ident = !{!0}
+!llvm.module.flags = !{!1}
+
+!0 = !{!"flang version 21.0.0"}
+!1 = !{i32 2, !"Debug Info Version", i32 3}
+!2 = distinct !{!2, !3}
+!3 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 true}
+
+; CHECK-NOT: llvm.loop.vectorize.reassociate_fpreductions.enable
+; CHECK: !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: !{!"llvm.loop.unroll.runtime.disable"}
|
This metadata allows unsafe reassociations of computations during the loop vectorization. For example, it allows vectorizing loops with floating-point reductions without the need to compile the whole function/program with -fassociative-math.
2c5a7ac to
5ba9cbd
Compare
|
This LGTM as I believe it to be a reasonable behaviour, but should probably get approval from one of the others. |
|
Hi @fhahn @jdoerfert can you please give some feedback for the updated changes? |
|
Hello @fhahn, can you please re-review this? |
|
Friendly ping. |
|
@fhahn, friendly ping. |
| For example, reassociation of floating point reduction | ||
| in a loop with ``!{!"llvm.loop.vectorize.enable", i1 1}`` metadata is allowed | ||
| regardless of the value of | ||
| ``llvm.loop.vectorize.reassociate_fpreductions.enable``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick thought - what would you expect to happen for nested loops where the reduction variable is used at all levels of the loop? For example,
float v = 0;
for (...) {
for (...) {
for (...) {
v += ...;
}
v += ...;
}
v += ...;
}
Suppose the metadata is only added to the outer loop, but not the inner loops. It's possible that the inner loops get fully unrolled such that only the outer loop remains by the time we run the loop vectoriser. Is it valid to still reassociate? If so, that implies all inner loops must inherit the property from the outer loop. Would you consider it a bug to add it to the outer loop, but not the inner loops? Alternatively, if the inner loops do not get unrolled, would it be legal for the vectoriser to walk up to the outermost loop and use the metadata on the outermost loop to reassociate reductions on the innermost, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the example!
Since the user allowed reassociation for the reduction computation in the outer loop (e.g. via an option), we may think of the reduction computations as "inaccurate" already. So we are free to do any reassociations even for the code in the inner loops (if they are unrolled) or not do it (if they are not unrolled).
Sticking to the same logic, it should be legal for vectorizer to walk up to the outermost loop and use the metadata to reassociate reductions on the inner loops.
So far I am planning to inject the metadata based on the command line option, so a module will have the metadata consistently attached to all loops. The situation you described may occur due to LTO, and I think it is hard to provide finegrain controls such as "compute this part of the reduction without reassociation, and this part with reassociation". So, basically, the outer loop "wins" and the only way to prevent this is to use noinline (effectively disabling all optimizations in the outer loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable and thanks for explaining. I think it's worth explicitly stating this in the LangRef because once the metadata exists in LLVM it could be used by other frontends. For example, I can imagine in future someone may add a C level pragma that maps to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think the reverse is also true. Suppose in your outer loop you set llvm.loop.vectorize.reassociate_fpreductions.enable to 0, that should override any inner loop that sets it to 1 for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think the reverse is also true. Suppose in your outer loop you set llvm.loop.vectorize.reassociate_fpreductions.enable to 0, that should override any inner loop that sets it to 1 for consistency.
Hmm, that does not sound right to me. If the inner loop computes a different reduction than the outer loop, then the metadata should probably not apply to the inner loop, e.g.:
double s1 = 0.0;
for (...) {
double s2 = 0.0;
for (...) {
s2 += ...;
}
s1 += ...;
}
Do you think it will be more consistent to propagate the metadata's "enable" effect to the whole loop-nest regardless of which loop it is set on?
P.S. I am on vacation for 1.5 weeks, and I won't be able to reply to the comments during my absence. Sorry for the inconvenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I guess that depends upon how you want this metadata to behave and what you want to achieve. My point really was that it should be consistent in my opinion - it would seem odd to permit llvm.loop.vectorize.reassociate_fpreductions.enable=1 to override inner loops, but not permit llvm.loop.vectorize.reassociate_fpreductions.enable=0 given something has gone to the effort of explicitly adding it. Of course if the metadata is completely missing from the outer loop (surely the common case?), then it cannot override any metadata on inner loops anyway. I think whatever behaviour we decide upon should be documented explicitly in the LangRef to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. I finally found time to get back to this. I promised to show how NVHPC compiler works, and I have some details now.
Nvfortran has an option -Mvect=assoc/noassoc that allows/disallows vectorizing FP reductions. Nvfortran may be not a great example to demonstrate how the mix of the different options works in case of the cross-module inlining, because it looks like it just relies on whatever the options are during the compilation after the cross-module function inlining.
I tried the following example:
callee.f90:
subroutine inner(y,s)
real :: y(*), s
do j=i,100
s=s+y(j)
end do
end subroutine inner
caller.f90:
subroutine test(x,y,s)
interface
subroutine inner(y,s)
real :: y(*), s
end subroutine inner
end interface
real :: x(*), y(*), s
do i=1,100
call inner(y,s)
s=s+x(i)
end do
end subroutine test
The first step is to create an inlining "library" for the callee.f90: nvfortran -cpp -O3 callee.f90 -Minfo=all -Mvect=assoc/noassoc -c -Mextract=lib:reductions
The second step is to use the inlining "library" during the compilation of the caller.f90: nvfortran -cpp -O3 caller.f90 -Minfo=all -Mvect=assoc/noassoc -Minline=lib:reductions -c
Regardless of the -Mvect=assoc/noassoc option used during the first step, the vectorization decision is based on the option value used during the second step. I.e. -Mvect=assoc results in the inner loop being vectorized, and -Mvect=noassoc disables vectorization.
Besides the reordering of the reduction computations, nvfortran does not apply any other FP math reassociations.
The most usual use-case that I anticipate the NVHPC users may want is that most of the code is compiled with allowing FP reductions reassociation. But then some accuracy-critical loops with reductions may need to be compiled without reductions reassociation. One way to do this is to extract such loops into separate functions/module and compile them without reductions reassociation. Then after the cross-module inlining, the reduction computations within these loops are not supposed to be reassociated (even if they are loops with constant trip counts that may be completely unrolled and appear inside the outer loops existing in the caller compiled with more relaxed reduction behavior).
In this usage model, it is expected that the metadata is set to either 1 or 0 for all the loops, but how we can define the metadata merging rules?
For correctness, it sounds like the inner loops should maintain their 0 value even when completely unrolled, so 0 (or the absense of metadata) should propagate outwards and override any 1 on the outer loops. And 1 cannot be propagated outward and override any outer 0 (or the absence of metadata).
I am not sure now where such metadata propagation can be made reliably, given that different passes may do function inlining. It does not seem feasible to require that the metadata propagation is run after each such pass that may change the loop nesting. Can this be done in vectorizer itself by querying the whole loop nest where the loop being vectorized is located?
You brought up a great point, and I do not know how to address it properly.
I am wondering now if the approach suggested during the vectorizer meeting is more viable: (sorry, I did not remember the name of the person) suggested a FastMathFlag to be attached to FP operations that will allow their reassociation only if it is required for vectorizing reductions. It sounds to be more consistent, but maybe someone can find drawbacks in it as well.
I think I need to collect more performance and correctness data before pushing this forward, and the LTO aspect is not a thing that I am concerned about right now. Would that be acceptable to add an engineering option that allows reductions reassociation, so that I can experiment with multiple benchmarks and bring back some factual data? (this was one of the suggestions during the vectorizer meeting as well)
This metadata allows unsafe reassociations of computations during
the loop vectorization. For example, it allows vectorizing loops
with floating-point reductions without the need to compile the whole
function/program with -fassociative-math.
RFC: https://discourse.llvm.org/t/llvm-proposing-llvm-loop-vectorize-reassociation-enable-metadata/86573