-
Notifications
You must be signed in to change notification settings - Fork 15k
[flang] add -floop-interchange and enable it with opt levels #140182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Sebastian Pop (sebpop) ChangesHi, two patches enable the use of -floop-interchange from the flang driver and enable LLVM's loop interchange at levels -O2, -O3, -Ofast, and -Os. Full diff: https://github.com/llvm/llvm-project/pull/140182.diff 9 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 11677626dbf1f..287a00863bb35 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4141,9 +4141,9 @@ def ftrap_function_EQ : Joined<["-"], "ftrap-function=">, Group<f_Group>,
HelpText<"Issue call to specified function rather than a trap instruction">,
MarshallingInfoString<CodeGenOpts<"TrapFuncName">>;
def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
- HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option]>;
+ HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
- HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option]>;
+ HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index e4bad39f8332a..89f4ebd519ebf 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3152,3 +3152,16 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
options::OPT_fno_slp_vectorize, EnableSLPVec))
CmdArgs.push_back("-vectorize-slp");
}
+
+void tools::handleInterchangeLoopsArgs(const ArgList &Args,
+ ArgStringList &CmdArgs) {
+ // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+ // implement a separate function to infer loop interchange from opt level.
+ // For now, enable loop-interchange at the same opt levels as loop-vectorize.
+ bool EnableInterch = shouldEnableVectorizerAtOLevel(Args, false);
+ OptSpecifier interchangeAliasOption =
+ EnableInterch ? options::OPT_O_Group : options::OPT_floop_interchange;
+ if (Args.hasFlag(options::OPT_floop_interchange, interchangeAliasOption,
+ options::OPT_fno_loop_interchange, EnableInterch))
+ CmdArgs.push_back("-floop-interchange");
+}
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.h b/clang/lib/Driver/ToolChains/CommonArgs.h
index 96bc0619dcbc0..6d36a0e8bf493 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.h
+++ b/clang/lib/Driver/ToolChains/CommonArgs.h
@@ -259,6 +259,10 @@ void renderCommonIntegerOverflowOptions(const llvm::opt::ArgList &Args,
bool shouldEnableVectorizerAtOLevel(const llvm::opt::ArgList &Args,
bool isSlpVec);
+/// Enable -floop-interchange based on the optimization level selected.
+void handleInterchangeLoopsArgs(const llvm::opt::ArgList &Args,
+ llvm::opt::ArgStringList &CmdArgs);
+
/// Enable -fvectorize based on the optimization level selected.
void handleVectorizeLoopsArgs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index b1ca747e68b89..54176381b6e5b 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -152,6 +152,7 @@ void Flang::addCodegenOptions(const ArgList &Args,
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
CmdArgs.push_back("-fstack-arrays");
+ handleInterchangeLoopsArgs(Args, CmdArgs);
handleVectorizeLoopsArgs(Args, CmdArgs);
handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index d9dbd274e83e5..7ced60f512219 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -35,6 +35,7 @@ CODEGENOPT(PrepareForThinLTO , 1, 0) ///< Set when -flto=thin is enabled on the
CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
+CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 28f2f69f23baf..0bdbb616136f1 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -269,6 +269,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
clang::driver::options::OPT_fno_stack_arrays, false))
opts.StackArrays = 1;
+ if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
+ opts.InterchangeLoops = 1;
+
if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
opts.VectorizeLoop = 1;
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index c1f47b12abee2..7c936ee23009d 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -915,6 +915,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
if (ci.isTimingEnabled())
si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
pto.LoopUnrolling = opts.UnrollLoops;
+ pto.LoopInterchange = opts.InterchangeLoops;
pto.LoopInterleaving = opts.UnrollLoops;
pto.LoopVectorization = opts.VectorizeLoop;
pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index e139bda7e4950..35eb7b61429fb 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -421,7 +421,8 @@ static void CheckSubscripts(
static void CheckSubscripts(
semantics::SemanticsContext &context, CoarrayRef &ref) {
- const Symbol &coarraySymbol{ref.GetBase().GetLastSymbol()};
+ const auto &base = ref.GetBase();
+ const Symbol &coarraySymbol{base.GetLastSymbol()};
Shape lb, ub;
if (FoldSubscripts(context, coarraySymbol, ref.subscript(), lb, ub)) {
ValidateSubscripts(context, coarraySymbol, ref.subscript(), lb, ub);
diff --git a/flang/test/Driver/loop-interchange.f90 b/flang/test/Driver/loop-interchange.f90
new file mode 100644
index 0000000000000..d5d62e9a777d2
--- /dev/null
+++ b/flang/test/Driver/loop-interchange.f90
@@ -0,0 +1,13 @@
+! RUN: %flang -### -S -floop-interchange %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -fno-loop-interchange %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! CHECK-LOOP-INTERCHANGE: "-floop-interchange"
+! CHECK-NO-LOOP-INTERCHANGE-NOT: "-floop-interchange"
+
+program test
+end program
|
|
@llvm/pr-subscribers-flang-driver Author: Sebastian Pop (sebpop) ChangesHi, two patches enable the use of -floop-interchange from the flang driver and enable LLVM's loop interchange at levels -O2, -O3, -Ofast, and -Os. Full diff: https://github.com/llvm/llvm-project/pull/140182.diff 9 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 11677626dbf1f..287a00863bb35 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4141,9 +4141,9 @@ def ftrap_function_EQ : Joined<["-"], "ftrap-function=">, Group<f_Group>,
HelpText<"Issue call to specified function rather than a trap instruction">,
MarshallingInfoString<CodeGenOpts<"TrapFuncName">>;
def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
- HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option]>;
+ HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
- HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option]>;
+ HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index e4bad39f8332a..89f4ebd519ebf 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3152,3 +3152,16 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
options::OPT_fno_slp_vectorize, EnableSLPVec))
CmdArgs.push_back("-vectorize-slp");
}
+
+void tools::handleInterchangeLoopsArgs(const ArgList &Args,
+ ArgStringList &CmdArgs) {
+ // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+ // implement a separate function to infer loop interchange from opt level.
+ // For now, enable loop-interchange at the same opt levels as loop-vectorize.
+ bool EnableInterch = shouldEnableVectorizerAtOLevel(Args, false);
+ OptSpecifier interchangeAliasOption =
+ EnableInterch ? options::OPT_O_Group : options::OPT_floop_interchange;
+ if (Args.hasFlag(options::OPT_floop_interchange, interchangeAliasOption,
+ options::OPT_fno_loop_interchange, EnableInterch))
+ CmdArgs.push_back("-floop-interchange");
+}
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.h b/clang/lib/Driver/ToolChains/CommonArgs.h
index 96bc0619dcbc0..6d36a0e8bf493 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.h
+++ b/clang/lib/Driver/ToolChains/CommonArgs.h
@@ -259,6 +259,10 @@ void renderCommonIntegerOverflowOptions(const llvm::opt::ArgList &Args,
bool shouldEnableVectorizerAtOLevel(const llvm::opt::ArgList &Args,
bool isSlpVec);
+/// Enable -floop-interchange based on the optimization level selected.
+void handleInterchangeLoopsArgs(const llvm::opt::ArgList &Args,
+ llvm::opt::ArgStringList &CmdArgs);
+
/// Enable -fvectorize based on the optimization level selected.
void handleVectorizeLoopsArgs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index b1ca747e68b89..54176381b6e5b 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -152,6 +152,7 @@ void Flang::addCodegenOptions(const ArgList &Args,
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
CmdArgs.push_back("-fstack-arrays");
+ handleInterchangeLoopsArgs(Args, CmdArgs);
handleVectorizeLoopsArgs(Args, CmdArgs);
handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index d9dbd274e83e5..7ced60f512219 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -35,6 +35,7 @@ CODEGENOPT(PrepareForThinLTO , 1, 0) ///< Set when -flto=thin is enabled on the
CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
+CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 28f2f69f23baf..0bdbb616136f1 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -269,6 +269,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
clang::driver::options::OPT_fno_stack_arrays, false))
opts.StackArrays = 1;
+ if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
+ opts.InterchangeLoops = 1;
+
if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
opts.VectorizeLoop = 1;
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index c1f47b12abee2..7c936ee23009d 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -915,6 +915,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
if (ci.isTimingEnabled())
si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
pto.LoopUnrolling = opts.UnrollLoops;
+ pto.LoopInterchange = opts.InterchangeLoops;
pto.LoopInterleaving = opts.UnrollLoops;
pto.LoopVectorization = opts.VectorizeLoop;
pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/lib/Semantics/expression.cpp b/flang/lib/Semantics/expression.cpp
index e139bda7e4950..35eb7b61429fb 100644
--- a/flang/lib/Semantics/expression.cpp
+++ b/flang/lib/Semantics/expression.cpp
@@ -421,7 +421,8 @@ static void CheckSubscripts(
static void CheckSubscripts(
semantics::SemanticsContext &context, CoarrayRef &ref) {
- const Symbol &coarraySymbol{ref.GetBase().GetLastSymbol()};
+ const auto &base = ref.GetBase();
+ const Symbol &coarraySymbol{base.GetLastSymbol()};
Shape lb, ub;
if (FoldSubscripts(context, coarraySymbol, ref.subscript(), lb, ub)) {
ValidateSubscripts(context, coarraySymbol, ref.subscript(), lb, ub);
diff --git a/flang/test/Driver/loop-interchange.f90 b/flang/test/Driver/loop-interchange.f90
new file mode 100644
index 0000000000000..d5d62e9a777d2
--- /dev/null
+++ b/flang/test/Driver/loop-interchange.f90
@@ -0,0 +1,13 @@
+! RUN: %flang -### -S -floop-interchange %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -fno-loop-interchange %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-INTERCHANGE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-INTERCHANGE %s
+! CHECK-LOOP-INTERCHANGE: "-floop-interchange"
+! CHECK-NO-LOOP-INTERCHANGE-NOT: "-floop-interchange"
+
+program test
+end program
|
|
For more context, this is part of our loop-interchange enablement story, see our RFC here: https://discourse.llvm.org/t/enabling-loop-interchange/82589. We have fixed all the compile-time issues and loop-interchange issues that we are aware of, and would like to enable this in the C/C++ flow, see here: #124911. As part of this work, we also promised to fix DependenceAnalysis. The last DA correctness corner-case that is being worked on is: #123436. This is a corner-case for C/C++ related to type punning, different offset sizes that won't be a problem in Fortran. Therefore, we think that enabling interchange and dependence analysis for Fortran makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. Do you have any compilation time and performance data?
flang/lib/Semantics/expression.cpp
Outdated
| const auto &base = ref.GetBase(); | ||
| const Symbol &coarraySymbol{base.GetLastSymbol()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is this an unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been submitted separately #138793
Without this change I cannot build flang on arm64-linux ubuntu 24.04 machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested a review for #138793. It's probably best to proceed with this after that has been merged.
@madhur13490 did several changes to loop interchange to optimize the overall compilation time with the pass. I believe Madhur has only looked at c/c++ benchmarks and not at how loop interchange would impact flang. I think that if compilation time is good for c/c++, it should also be good for fortran. On the perf side, I was looking if we can already catch swim from cpu2000, and that fails with not enough data to infer number of iterations. I will be working on adding assume (N < 1335) based on analyzing array decls and infer loop bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test that ensures that the loop-interchange pass is added to the pipeline. Perhaps something like flang/test/Driver/slp-vectorize.f90
flang/lib/Semantics/expression.cpp
Outdated
| const auto &base = ref.GetBase(); | ||
| const Symbol &coarraySymbol{base.GetLastSymbol()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested a review for #138793. It's probably best to proceed with this after that has been merged.
This information is a bit spread out in the other tickets that I linked earlier, so to summarise that, compile times look really good and increases very minimal after the work that Madhur did. In #124911, I wrote:
Regarding performance, as I also wrote in that ticket, loop-interchange has a lot of potential. It triggers a lot of times e.g. in the LLVM test-suite, see this #124911 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably deserves a mention in the Flang ReleaseNotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding names: I would add 4 characters "ange" to the variable name: EnableInterch -> EnableInterchange :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: interchangeAliasOption - > InterchangeAliasOption ?
This patch allows flang to recognize the flags -floop-interchange and -fno-loop-interchange. -floop-interchange adds the loop interchange pass to the pass pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
|
To Flang community: |
|
Flang's policy is not really different from Clang's, but I can raise it in the next community call |
Thanks. I think this should be removed from LLVM 21. |
|
Does this affect clang as well? If we do decide to disable it, will it only be in flang, or in both clang and flang? If the latter, we should check with the clang developers as well. |
|
It is only enabled in flang, and not in clang at the moment. I think it doesn't affect to clang. |
You're pointing to 4 patches, one of them I raised which I found by using different fuzzers. In my fuzzing exercise, I found ~30 issues, and I found a lot more issues in the loop vectoriser than there are issues in DA. Are we going to disable the vectoriser because we raise issues against it of which some of them are still open? That's a rhetorical question. My point is, this needs to be judged on a case by case basis, and we shouldn't just point to a list of less than a handful issues, that is not going to be useful. |
|
I'm not trying to emphasize the number of issues raised on GitHub, the real concern is their substance. What I pointed out in the issues represents only a small portion of the overall problem. There are fundamental flaws in DA's implementation, particularly in how it handles wrapping of SCEVs. These issues aren't caught by assertions and can silently lead to incorrect transformations. I don't know how many miscompiles exist, but it would not be a small number. The situation seems to differ from that of the vectorizer. DA was already broken when this PR was submitted. I believe it's premature to enable any passes that rely on DA by default. This would not directly reflect the severity of the issue, but it's the conclusion I reached after thoroughly examining DA for a month. There are simply too many bugs. Also note that properly fixing these issues will take considerable time, as we (or at least I) still aren't sure what the best approach to solving them is. |
So the real question here is: how important or realistic is this for Fortran? |
This is wrong, we fixed all outstanding bugs reported against DA at the time when we enabled interchange in flang.
If we don't enable interchange by default then we don't get more bugs reported against it because the pass is not getting used.
too many is not substantiated.
How many of the current 4 bugs you point to are related to flang? |
|
Instead of spreading FUD around, please be more precise and open bug reports for the following:
Tell me more about it in a separate bug, and let's see how we can work towards fixing it.
What are you speaking about?
Please provide data. |
I’m referring to the bugs that existed at that time but were not reported. The issues that were raised do not represent all the defects.
I don’t think enabling passes just to get more bug reports is a good strategy. At the very least, we should carefully check the existing code beforehand. To be clear: all my bug reports are what I found by reading the code. I didn't use flang to find them.
Are there any guarantees that Flang won’t generate the IR I mentioned in the issues? If so, that’s fine, but I don’t think any such guarantees exist.
At the very least I can't help feeling anxious, since there seem to be so many issues that it’s really hard to imagine what might happen.
Here are some examples that immediately come to my mind (not limited to them):
But the most serious issue is that DA doesn’t handle overflow at all.
I'm saying that DA can miss dependency, and it can result in incorrect legality check in LoopInterchange.
I don’t have any data, so I'm fine if the Flang community is okay with it. However, I strongly think these issues also affect Flang. In the end, there’s no data showing that these issues are unrelated to Flang. |
|
Thank you for raising these issues to my attention, I will be working on fixing these bugs. I believe those bugs may not impact Fortran programs as generated by flang:
|
|
Weighing in with my personal opinion here:
In this particular case, the pass has been merged for a long time. In that time we have built a lot of applications and benchmarks at -O3 in our internal CI and not seen any issues known to arise from loop interchange. Presumably other organisations have been testing Flang during this time too. @kasuga-fj thank you very much for reporting this. It is helpful for the whole community to know that there could be issues here so that we know where to look if any miss-compilations are discovered. Your careful review of DA looks like a great step forward. If the reported issues break something you are doing with Flang then I will be happy to consider disabling the pass. |
If you look over the code you can see that no attention was paid to integer wrapping behavior. For instance,
There is
@kasuga-fj pointed out bugs in the current implementation. Whether those are sufficient to justify changing default behavior is subjective. Generally assume best intentions. |
I agree, and I will keep this in mind.
For GCC, I adapted the Omega test (from Bill Pugh) to produce the same representation as the classic dependence tests. And we got more confidence that the dependence analysis implementation was correct. I removed the extra checks gcc-mirror/gcc@49b8fe6 10 years down the road. Maybe we could do the same to build trust in the results provided by LLVM's DA. |
|
@tblah Thank you for the comments! This is exactly what I wanted to know! First of all, I'm not strongly requesting to revert this PR. The primary purpose of my first comment is to share information, and of course, I think it's up to the Flang developers to make the final decision (especially since I don't know much about Flang myself).
This makes a lot of sense to me. Thanks for sharing the contexts. Let me share a few more details in the hope that they will be helpful.
; for (i = 0; i < 100; i++) for (int j = 0; j < 100; j++)
; a[j & 1][i & 1] = 42;
define void @f(ptr %a) {
entry:
br label %loop.i.header
loop.i.header:
%i = phi i64 [ 0, %entry ], [ %i.next, %loop.i.latch ]
%and.i = and i64 %i, 1
br label %loop.j
loop.j:
%j = phi i64 [ 0, %loop.i.header ], [ %j.next, %loop.j ]
%and.j = and i64 %j, 1
%idx = getelementptr [2 x [2 x i8]], ptr %a, i64 0, i64 %and.i, i64 %and.j
store i8 42, ptr %idx
%j.next = add i64 %j, 1
%exitcond.j = icmp eq i64 %j.next, 100
br i1 %exitcond.j, label %loop.i.latch, label %loop.j
loop.i.latch:
%i.next = add i64 %i, 1
%exitcond.i = icmp eq i64 %i.next, 100
br i1 %exitcond.i, label %exit, label %loop.i.header
exit:
ret void
}Current DA result in "there are no loop-carried dependencies for the store", which is obviously incorrect (godbolt). Honestly, I'm no longer sure what's realistic here, or how many similar cases might exist... I believe resolving this will take some time.
|
|
I posted a patch that fixes your above testcase. |
TBH, nobody writes such code: |
|
To be honest, I am very surprised that this PR was submitted, let alone landed. We have explicitly not approved enabling LoopInterchange by default in LLVM due to outstanding issues. So instead you go ahead and enable it directly in Flang instead, without even notifying any of the people who were involved in the original discussions. That's not how things should work. Please revert this patch ASAP and request an LLVM 21 backport. Edit: It should also be fine to just disable loop interchange by default, while retaining the |
(Thank you very much for finding this PR...) |
|
I’ll leave the revert up to the author. |
|
@nikic: I feel that all nuance has been lost here in this discussion, so I would like to bring some of that back:
If you feel this should be reverted based on the grounds of process or inclusion, okay, then it is what it is, fair enough. |
|
@sjoerdmeijer Let me address two points separately. The first is the original submission of this PR. There was an existing PR to enable loop interchange in #124911, and discussion for this enablement was consolidated there. I can see no evidence that reviewers at the time believed that loop interchange / dependence analysis is ready to be enabled. You could have posted a comment there asking whether people think it's okay to enable it just for Flang due to difference language characteristics -- and I expect you'd have gotten a fairly clear "no" on making this language-dependent. But that did not happen. I guess this was just a miscommunication. Now, regarding the issues in dependence analysis that have been found more recently: Yes, I believe these issues are quite severe and justify a revert. It's not that these issues are easy to trigger, but that they point to some rather fundamental issues in the dependence analysis implementation, which will likely require non-trivial changes to fully address. I don't want to backport all necessary changes to LLVM 21. And yes, there is a certain double standard when it comes to issues in a newly enabled pass, and issues in a pass that has already been enabled for a very long time. In the latter case, it would take some rather extreme circumstances for us to disable the pass entirely, while in the former case this is the default response for non-trivial issues. |
|
As this is time critical (needs to land prior to the LLVM 21 release tomorrow) I've submitted a PR myself: #155279 |
Disable loop interchange by default, while keeping the ability to explicitly enable using `-floop-interchange`. This matches Clang. See discussion on #140182.
Disable loop interchange by default, while keeping the ability to explicitly enable using `-floop-interchange`. This matches Clang. See discussion on llvm/llvm-project#140182.
Disable loop interchange by default, while keeping the ability to explicitly enable using `-floop-interchange`. This matches Clang. See discussion on llvm#140182. (cherry picked from commit 8849750)
Disable loop interchange by default, while keeping the ability to explicitly enable using `-floop-interchange`. This matches Clang. See discussion on llvm/llvm-project#140182. (cherry picked from commit 8849750)
Hi,
two patches enable the use of -floop-interchange from the flang driver and enable LLVM's loop interchange at levels -O2, -O3, -Ofast, and -Os.