-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[RFC][llvm] Added llvm.loop.vectorize.reassociate_fpreductions.enable metadata. #141685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
1619ad6
9511b6e
5ba9cbd
91f390e
676dede
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7593,6 +7593,24 @@ Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving | |
| multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0 | ||
| then the interleave count will be determined automatically. | ||
|
|
||
| '``llvm.loop.vectorize.reassociate_fpreductions.enable``' Metadata | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| This metadata selectively allows or disallows reassociating floating-point | ||
| reductions, which otherwise may be unsafe to reassociate, during the loop | ||
| vectorization. For example, a floating point ``ADD`` reduction without | ||
| ``reassoc`` fast-math flags may be vectorized provided that this metadata | ||
| allows it. The first operand is the string | ||
| ``llvm.loop.vectorize.reassociate_fpreductions.enable`` | ||
| and the second operand is a bit. If the bit operand value is 1 unsafe | ||
| reduction reassociations are enabled. A value of 0 disables unsafe | ||
| reduction reassociations. | ||
vzakhari marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| !0 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 0} | ||
| !1 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 1} | ||
|
|
||
| '``llvm.loop.vectorize.enable``' Metadata | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a quick thought - what would you expect to happen for nested loops where the reduction variable is used at all levels of the loop? For example, Suppose the metadata is only added to the outer loop, but not the inner loops. It's possible that the inner loops get fully unrolled such that only the outer loop remains by the time we run the loop vectoriser. Is it valid to still reassociate? If so, that implies all inner loops must inherit the property from the outer loop. Would you consider it a bug to add it to the outer loop, but not the inner loops? Alternatively, if the inner loops do not get unrolled, would it be legal for the vectoriser to walk up to the outermost loop and use the metadata on the outermost loop to reassociate reductions on the innermost, etc?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the example! Since the user allowed reassociation for the reduction computation in the outer loop (e.g. via an option), we may think of the reduction computations as "inaccurate" already. So we are free to do any reassociations even for the code in the inner loops (if they are unrolled) or not do it (if they are not unrolled). Sticking to the same logic, it should be legal for vectorizer to walk up to the outermost loop and use the metadata to reassociate reductions on the inner loops. So far I am planning to inject the metadata based on the command line option, so a module will have the metadata consistently attached to all loops. The situation you described may occur due to LTO, and I think it is hard to provide finegrain controls such as "compute this part of the reduction without reassociation, and this part with reassociation". So, basically, the outer loop "wins" and the only way to prevent this is to use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems reasonable and thanks for explaining. I think it's worth explicitly stating this in the LangRef because once the metadata exists in LLVM it could be used by other frontends. For example, I can imagine in future someone may add a C level pragma that maps to this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, I think the reverse is also true. Suppose in your outer loop you set llvm.loop.vectorize.reassociate_fpreductions.enable to 0, that should override any inner loop that sets it to 1 for consistency.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hmm, that does not sound right to me. If the inner loop computes a different reduction than the outer loop, then the metadata should probably not apply to the inner loop, e.g.: Do you think it will be more consistent to propagate the metadata's "enable" effect to the whole loop-nest regardless of which loop it is set on? P.S. I am on vacation for 1.5 weeks, and I won't be able to reply to the comments during my absence. Sorry for the inconvenience.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, I guess that depends upon how you want this metadata to behave and what you want to achieve. My point really was that it should be consistent in my opinion - it would seem odd to permit
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the long delay. I finally found time to get back to this. I promised to show how NVHPC compiler works, and I have some details now. Nvfortran has an option I tried the following example: callee.f90: caller.f90: The first step is to create an inlining "library" for the callee.f90: The second step is to use the inlining "library" during the compilation of the caller.f90: Regardless of the Besides the reordering of the reduction computations, nvfortran does not apply any other FP math reassociations. The most usual use-case that I anticipate the NVHPC users may want is that most of the code is compiled with allowing FP reductions reassociation. But then some accuracy-critical loops with reductions may need to be compiled without reductions reassociation. One way to do this is to extract such loops into separate functions/module and compile them without reductions reassociation. Then after the cross-module inlining, the reduction computations within these loops are not supposed to be reassociated (even if they are loops with constant trip counts that may be completely unrolled and appear inside the outer loops existing in the caller compiled with more relaxed reduction behavior). In this usage model, it is expected that the metadata is set to either 1 or 0 for all the loops, but how we can define the metadata merging rules? For correctness, it sounds like the inner loops should maintain their I am not sure now where such metadata propagation can be made reliably, given that different passes may do function inlining. It does not seem feasible to require that the metadata propagation is run after each such pass that may change the loop nesting. Can this be done in vectorizer itself by querying the whole loop nest where the loop being vectorized is located? You brought up a great point, and I do not know how to address it properly. I am wondering now if the approach suggested during the vectorizer meeting is more viable: (sorry, I did not remember the name of the person) suggested a FastMathFlag to be attached to FP operations that will allow their reassociation only if it is required for vectorizing reductions. It sounds to be more consistent, but maybe someone can find drawbacks in it as well. I think I need to collect more performance and correctness data before pushing this forward, and the LTO aspect is not a thing that I am concerned about right now. Would that be acceptable to add an engineering option that allows reductions reassociation, so that I can experiment with multiple benchmarks and bring back some factual data? (this was one of the suggestions during the vectorizer meeting as well) |
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| ; Check that the loop with a floating-point reduction is vectorized | ||
| ; due to llvm.loop.vectorize.reassociate_fpreductions.enable metadata. | ||
| ; RUN: opt -passes=loop-vectorize -S < %s 2>&1 | FileCheck %s | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| source_filename = "FIRModule" | ||
| target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" | ||
| target triple = "x86_64-unknown-linux-gnu" | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite) | ||
| define void @test_(ptr captures(none) %0, ptr readonly captures(none) %1) local_unnamed_addr #0 { | ||
| ; CHECK-LABEL: define void @test_( | ||
| ; CHECK: fadd contract <4 x float> {{.*}} | ||
| ; CHECK: call contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> {{.*}}) | ||
| ; | ||
| %invariant.gep = getelementptr i8, ptr %1, i64 -4 | ||
| %.promoted = load float, ptr %0, align 4 | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| br label %3 | ||
|
|
||
| 3: ; preds = %2, %3 | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| %indvars.iv = phi i64 [ 1, %2 ], [ %indvars.iv.next, %3 ] | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| %4 = phi float [ %.promoted, %2 ], [ %6, %3 ] | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| %gep = getelementptr float, ptr %invariant.gep, i64 %indvars.iv | ||
| %5 = load float, ptr %gep, align 4 | ||
| %6 = fadd contract float %4, %5 | ||
| %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 | ||
| %exitcond.not = icmp eq i64 %indvars.iv.next, 1001 | ||
| br i1 %exitcond.not, label %7, label %3, !llvm.loop !2 | ||
|
|
||
| 7: ; preds = %3 | ||
| %.lcssa = phi float [ %6, %3 ] | ||
| store float %.lcssa, ptr %0, align 4 | ||
| ret void | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| attributes #0 = { nofree norecurse nosync nounwind memory(argmem: readwrite) "target-cpu"="x86-64" } | ||
|
|
||
| !llvm.ident = !{!0} | ||
| !llvm.module.flags = !{!1} | ||
|
|
||
| !0 = !{!"flang version 21.0.0"} | ||
| !1 = !{i32 2, !"Debug Info Version", i32 3} | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| !2 = distinct !{!2, !3} | ||
| !3 = !{!"llvm.loop.vectorize.reassociate_fpreductions.enable", i1 true} | ||
vzakhari marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ; CHECK-NOT: llvm.loop.vectorize.reassociate_fpreductions.enable | ||
| ; CHECK: !{!"llvm.loop.isvectorized", i32 1} | ||
| ; CHECK: !{!"llvm.loop.unroll.runtime.disable"} | ||
Uh oh!
There was an error while loading. Please reload this page.