-
Notifications
You must be signed in to change notification settings - Fork 15
Understanding compiler options
There are cases when using solely the -Ox options will not bring the desired performance (either size or speed) for a compiled function/application. In these cases we need to understand where is the program's bottle neck and if it can be solved either by passing various options to the compiler or by code source modifications. In this section, we look into compiler's command-line options and how they can help us in achieving better results.
The first step in optimizing your code is by tempering with architecture-independent optimizations. Almost each GCC's pass (i.e., optimization) can be turned on or off or steer using parameters. These optimizations are denoted by the following notation -fxxxx, where xxxx is the GCC pass that is turned on. To turn off a gcc pass, we need to pass -fno-xxxx to the compiler. The same observation holds for other types of optimizations such as the architecture-dependent ones. For more information about GCC options, please check the GCC manual.. It this desired to understand and know how those options works to properly use them.
To avoid being drown by the amount of option available, I use for my day-to-day source code exploration the following tree related options (either on or off):
-
-ftree-loop-ivcanon Create a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily. Useful especially in connection with unrolling.
-
-ftree-vectorize Perform loop vectorization on trees. This flag is enabled by default at -O3. This option is useful to use either if the ARC processor doesn't have the SIMD extensions as it performs extra code analysis and may improve the following optimizations.
-
-ftree-loop-if-convert Attempt to transform conditional jumps in the innermost loops to branch-less equivalents. The intent is to remove control-flow from the innermost loops in order to improve the ability of the vectorization pass to handle these loops. This is enabled by default if vectorization is enabled.
-
-f(no-)tree-dominator-opts Perform a variety of simple scalar cleanups (constant/copy propagation, redundancy elimination, range propagation and expression simplification) based on a dominator tree traversal. This also performs jump threading (to reduce jumps to jumps). This flag is enabled by default at -O and higher.
-
-f(no-)ivopts Perform induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees. Disabling the ivopts optimization may improve the number of hardware loops recognized by the compiler.
-
-fselective-scheduling Schedule instructions using selective scheduling algorithm. Selective scheduling runs instead of the first scheduler pass.
-
-fgcse Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation. It may be useful to disable this step specially when we want to have more SUB1/2/3, ADD1/2/3 type of operations generated.
-
-frename-registers Attempt to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers. Depending on the debug information format adopted by the target, however, it can make debugging impossible, since variables no longer stay in a “home register”. Enabled by default with -funroll-loops and -fpeel-loops.