Skip to content

Conversation

@tarunprabhu
Copy link
Contributor

@tarunprabhu tarunprabhu commented Oct 22, 2025

For the LLVM passes, the implementation mirrors that in clang. A number of
FIR and MLIR passes that may result in an increase in code size are not run
when either of these options is provided. Explicit speedup and size levels are
used instead of isOptimizingForSize since the latter assumes that speedupLevel
is 0 if optimizing for size. However, optimizing for size implies that the
speedup level is 2

Fixes #62268

@llvmbot llvmbot added flang:driver flang Flang issues not falling into any other category labels Oct 22, 2025
@tarunprabhu tarunprabhu changed the title [flang] Enable -Os and -Oz in flang [flang][Driver] Enable -Os and -Oz in flang Oct 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-driver

Author: Tarun Prabhu (tarunprabhu)

Changes

The implementation adheres closely to the implementation in clang. The effect on the pass pipelines have been tested.

Fixes #62268


Full diff: https://github.com/llvm/llvm-project/pull/164707.diff

4 Files Affected:

  • (modified) flang/include/flang/Frontend/CodeGenOptions.def (+2)
  • (modified) flang/lib/Frontend/CompilerInvocation.cpp (+23)
  • (modified) flang/lib/Frontend/FrontendActions.cpp (+10-1)
  • (modified) flang/test/Driver/default-optimization-pipelines.f90 (+13-1)
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index dc3da7ba5c7f3..5892d3ef24e5a 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -20,6 +20,8 @@ CODEGENOPT(Name, Bits, Default)
 #endif
 
 CODEGENOPT(OptimizationLevel, 2, 0) ///< The -O[0-3] option specified.
+/// The -Os (==1) or -Oz (==2) option is specified.
+CODEGENOPT(OptimizeSize, 2, 0) 
 
 CODEGENOPT(DebugPassManager, 1, 0) ///< Prints debug information for the new
                                    ///< pass manager.
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 548ca675db5ea..6dfff3b001aea 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -114,6 +114,10 @@ static unsigned getOptimizationLevel(llvm::opt::ArgList &args,
 
     assert(a->getOption().matches(clang::driver::options::OPT_O));
 
+    llvm::StringRef s(a->getValue());
+    if (s == "s" || s == "z")
+      return 2;
+
     return getLastArgIntValue(args, clang::driver::options::OPT_O, defaultOpt,
                               diags);
   }
@@ -121,6 +125,24 @@ static unsigned getOptimizationLevel(llvm::opt::ArgList &args,
   return defaultOpt;
 }
 
+/// Extracts the size-optimization level from \a args
+static unsigned getOptimizationLevelSize(llvm::opt::ArgList &args) {
+  if (llvm::opt::Arg *a =
+          args.getLastArg(clang::driver::options::OPT_O_Group)) {
+    if (a->getOption().matches(clang::driver::options::OPT_O)) {
+      switch (a->getValue()[0]) {
+      default:
+        return 0;
+      case 's':
+        return 1;
+      case 'z':
+        return 2;
+      }
+    }
+  }
+  return 0;
+}
+
 bool Fortran::frontend::parseDiagnosticArgs(clang::DiagnosticOptions &opts,
                                             llvm::opt::ArgList &args) {
   opts.ShowColors = parseShowColorsArgs(args);
@@ -273,6 +295,7 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
                              llvm::opt::ArgList &args,
                              clang::DiagnosticsEngine &diags) {
   opts.OptimizationLevel = getOptimizationLevel(args, diags);
+  opts.OptimizeSize = getOptimizationLevelSize(args);
 
   if (args.hasFlag(clang::driver::options::OPT_fdebug_pass_manager,
                    clang::driver::options::OPT_fno_debug_pass_manager, false))
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 0c630d2ba876d..d6fb98bc930c6 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -602,7 +602,16 @@ mapToLevel(const Fortran::frontend::CodeGenOptions &opts) {
   case 1:
     return llvm::OptimizationLevel::O1;
   case 2:
-    return llvm::OptimizationLevel::O2;
+    switch (opts.OptimizeSize) {
+    default:
+      llvm_unreachable("Invalid optimization level for size!");
+    case 0:
+      return llvm::OptimizationLevel::O2;
+    case 1:
+      return llvm::OptimizationLevel::Os;
+    case 2:
+      return llvm::OptimizationLevel::Oz;
+    }
   case 3:
     return llvm::OptimizationLevel::O3;
   }
diff --git a/flang/test/Driver/default-optimization-pipelines.f90 b/flang/test/Driver/default-optimization-pipelines.f90
index 08e407f73da5c..18108cd632220 100644
--- a/flang/test/Driver/default-optimization-pipelines.f90
+++ b/flang/test/Driver/default-optimization-pipelines.f90
@@ -14,10 +14,16 @@
 ! RUN: %flang_fc1 -S -O2 %s -flto=full -fdebug-pass-manager -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-O2-LTO
 ! RUN: %flang_fc1 -S -O2 %s -flto=thin -fdebug-pass-manager -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-O2-THINLTO
 
-! Verify that only the left-most `-O{n}` is used
+! Verify that only the right-most `-O{n}` is used
 ! RUN: %flang -S -O2 -O0 %s -Xflang -fdebug-pass-manager -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-O0
 ! RUN: %flang_fc1 -S -O2 -O0 %s -fdebug-pass-manager -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-O0
 
+! Verify that passing -Os/-Oz have the desired effect on the pass pipelines.
+! RUN: %flang -S -Os %s -Xflang -fdebug-pass-manager -o /dev/null 2>&1 \
+! RUN:     | FileCheck %s --check-prefix=CHECK-OSIZE
+! RUN: %flang -S -Oz %s -Xflang -fdebug-pass-manager -o /dev/null 2>&1 \
+! RUN:     | FileCheck %s --check-prefix=CHECK-OSIZE
+
 ! CHECK-O0-NOT: Running pass: SimplifyCFGPass on simple_loop_
 ! CHECK-O0: Running analysis: TargetLibraryAnalysis on simple_loop_
 ! CHECK-O0-ANYLTO: Running pass: CanonicalizeAliasesPass on [module]
@@ -33,6 +39,12 @@
 ! CHECK-O2-THINLTO: Running pass: CanonicalizeAliasesPass on [module]
 ! CHECK-O2-THINLTO: Running pass: NameAnonGlobalPass on [module]
 
+! -Os/-Oz imply -O2, so check that a pass that runs on O2 is run. Then check
+! that passes like LibShrinkWrap, that should not be run when optimizing for
+! size, are not run (see llvm/lib/Passes/PassBuilderPipelines.cpp).
+! CHECK-OSIZE: Running pass: SimplifyCFGPass on simple_loop_
+! CHECK-OSIZE-NOT: Running pass: LibCallsShrinkWrapPass on simple_loop_
+
 subroutine simple_loop
   integer :: i
   do i=1,5

@kiranchandramohan
Copy link
Contributor

Could you post in https://discourse.llvm.org/t/code-size-optimization-flags-in-flang/69482 regarding this patch to inform incase anyone has a different opinion?

@tarunprabhu tarunprabhu requested a review from ceseo October 23, 2025 00:28
@kiranchandramohan
Copy link
Contributor

Does Os and Oz run with optimization level 2? At these optimization levels flang might be performing intrinsic inlining, copy in inlining, math function inlining, loop versioning etc that might increase the size.

@tarunprabhu
Copy link
Contributor Author

Does Os and Oz run with optimization level 2?

Yes. This intentionally mirrors clang's implementation. Some LLVM optimizations like IPSCCP, inlining (or here) examine this level internally, but I haven't looked at exactly how these are affected.

We could do something different in flang to be more aggressive, but that would be a more involved change since we may need to build flang-specific pass pipelines. We could also disable some FIR-level optimizations, but I haven't looked into that.

Do you think that should be done as part of this PR?

@tblah
Copy link
Contributor

tblah commented Oct 23, 2025

Does Os and Oz run with optimization level 2?

Yes. This intentionally mirrors clang's implementation. Some LLVM optimizations like IPSCCP, inlining (or here) examine this level internally, but I haven't looked at exactly how these are affected.

We could do something different in flang to be more aggressive, but that would be a more involved change since we may need to build flang-specific pass pipelines. We could also disable some FIR-level optimizations, but I haven't looked into that.

Do you think that should be done as part of this PR?

I think what Kiran is referring to is the MLIR pass pipeline inside of flang. I don't think there is any equivalent to this in clang. See flang/lib/Optimizer/Pipelines.cpp. Some optional optimisations are only enabled when pc.OptLevel.isOptimizingForSpeed(). I suspect a subset of these (plus LoopVersioning and maybe others)) should not be enabled when optimising for size. There are lit tests for the pipeline which should be updated to reflect what the pipeline is when optimising for size.

@tarunprabhu
Copy link
Contributor Author

I think what Kiran is referring to is the MLIR pass pipeline inside of flang. I don't think there is any equivalent to this in clang. See flang/lib/Optimizer/Pipelines.cpp. Some optional optimisations are only enabled when pc.OptLevel.isOptimizingForSpeed(). I suspect a subset of these (plus LoopVersioning and maybe others)) should not be enabled when optimising for size. There are lit tests for the pipeline which should be updated to reflect what the pipeline is when optimising for size.

Ah, thanks for clarifying. I'll take a look

@tarunprabhu tarunprabhu marked this pull request as draft October 23, 2025 18:14
@tarunprabhu
Copy link
Contributor Author

Note to reviewers: Please disregard any changes that you see in this PR until I change it back out of draft mode.

@tarunprabhu tarunprabhu changed the title [flang][Driver] Enable -Os and -Oz in flang [DRAFT][flang][Driver] Enable -Os and -Oz in flang Oct 26, 2025
Tarun Prabhu added 2 commits November 7, 2025 05:44
For the LLVM passes, the implementation mirrors that in clang. A number of
FIR and MLIR passes that may result in an increase in code size are not run
when either of these options is provided. Explicit speedup and size levels are
used instead of isOptimizingForSize since the latter assumes that speedupLevel
is 0 if optimizing for size. However, optimizing for size implies that the
speedup level is 2
@tarunprabhu tarunprabhu changed the title [DRAFT][flang][Driver] Enable -Os and -Oz in flang [flang][Driver] Enable -Os and -Oz in flang Nov 7, 2025
@tarunprabhu
Copy link
Contributor Author

  • Several MLIR passes are now disabled if -Os or -Oz are provided.
  • The test of the MLIR pass pipeline has been completely overhauled to accommodate this.
  • The description of the PR has been updated to reflect these changes,

@tarunprabhu tarunprabhu marked this pull request as ready for review November 7, 2025 12:49
@tarunprabhu tarunprabhu requested a review from tblah November 7, 2025 12:50
@jeanPerier
Copy link
Contributor

Several MLIR passes are now disabled if -Os or -Oz are provided.

Have you gathered any metrics on the executable size/perf impact?

For instance, I expect that createInlineElementals may actually reduce code size (it is fusing loops), while I agree createInlineHLFIRAssign probably increases it (but not using it may cost a lot of perf).

All in all, I am a bit worried about complexifying flang pipeline (at least the MLIR pipeline) for this feature for which the use case in the community of Fortran user is a bit unclear to me.

I do expect a big impact on the code size is currently related to LLVM optimizations (inlining/loop versioning/unrolling).

More options are great on the paper, but they also means more complexity in the compiler, more technical debt/source of bugs/difficulty to reproduce issues.

If you have metrics and a use-case that justify the modification of the MLIR pipeline (say something like getting x% executable size reduction at the cost of y% slow-down on SPEC where x% and y% are what users using Oz/Os wants), I am fine with it.

@tarunprabhu
Copy link
Contributor Author

Several MLIR passes are now disabled if -Os or -Oz are provided.

Have you gathered any metrics on the executable size/perf impact?

I have not. That's a good point. I'll see what I can put together.

I don't have a use-case beyond addressing the bug-report that was filed which referenced the SPEC 2017 benchmark suite. Perhaps @ceseo, who filed that report, has more insight.

I do expect a big impact on the code size is currently related to LLVM optimizations (inlining/loop versioning/unrolling).

Do you think it would be worth splitting this into two PR's?

The first would accept the -Os and -Oz options and defer to the pass pipeline builder to determine which LLVM IR passes to schedule. The change to flang would be relatively minimal since the majority of the work is already done in LLVM's PassBuilder. This would be essentially the same as clang.

The second would tweak the MLIR pass pipeline, after a more careful analysis of the effects of the various passes (it may also be that we choose not to make any changes to the MLIR pass pipeline at all).

@tblah
Copy link
Contributor

tblah commented Nov 11, 2025

Maybe we could just tweak the MLIR pipeline for the most obvious cases (LoopVersioning, Inlining) and be extremely conservative about anything with an uncertain payoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:driver flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[flang] Feature request: support for -Os/-Oz

5 participants