Skip to content

[RISCV] Enable IPRA for RISC-V #126594

@mikhailramalho

Description

@mikhailramalho

Overview

This is a tracking issue for the enablement of IPRA (interprocedural register allocation) for the RISC-V backend. Enabling IPRA by default is of course predicated on validating both the correctness and the observed performance increase.

Status and tracking

IPRA can be enabled experimentally for RISC-V by passing -mllvm -enable-ipra -mllvm -enable-machine-outliner=never as flags to Clang HEAD:

  • Implement getIPRACSRegs hook to fix miscompile due to missed save/restore of ra. [RISCV] Implement getIPRACSRegs hook #125586 (see also [IPRA][RISCV] ra save/restore unexpectedly optimised out of function #124932)
  • Resolve negative interaction between machine-outliner and IPRA Crash with -enable-ipra for trivial internal linkage functions for RISC-V/AArch64/Arm #119556
    • -enable-machine-outliner=never is a usable but temporary workaround.
    • As noted in the linked issue, the same problem exists for other targets that enable the MachineOutliner, such as AArch64.
  • Ensure SPEC and the llvm-test-suite passes with IPRA enabled
  • Collect static data on the impact of enabling IPRA (e.g. number of functions optimized) for SPEC or other benchmarks.
  • Collect runtime data on the impact of enabling IPRA for SPEC or other benchmarks.
    • Execution time on SpacemiT and instruction count.
    • Compare against a baseline that unconditionally does setRequiresCodeGenSCCOrder() in TargetPassConfig to exclude differences due solely to the reordering of code due to that option.
  • Investigate any performance anomalies or regressions
  • (Optionally) Explore whether there is any pessimisation that prevents IPRA from being used on functions that should be able to use it.
  • Benchmark data allowing, propose IPRA be enabled by default for RISC-V

IPRA is enabled by default for AMDGPU but not other targets. There was a brief discussion on enabling for X86.

Data

All data was generated using commit 83fa117 plus the following patch:

diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index 5d9da9df9092..08f711bdae82 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -599,7 +599,7 @@ TargetPassConfig::TargetPassConfig(TargetMachine &TM, PassManagerBase &PM)
     TM.Options.EnableIPRA |= TM.useIPRA();
   }
 
-  if (TM.Options.EnableIPRA)
+  // if (TM.Options.EnableIPRA)
     setRequiresCodeGenSCCOrder();
 
   if (EnableGlobalISelAbort.getNumOccurrences())

Static Analysis

  • The following table shows the NumCSROpt (Number of functions optimized for callee saved registers), as reported by RegUsageInfoCollector, on SPEC built with -march=rva22u64_v+ipra+lto.
  • Data from rva22u64_v and rva23u64 are the same.
  • Enabling LTO substantially increases the scope for IPRA to take place: geomean shows a 215.6% increase in NumCSROpt when lto is enabled (see appendix).
  • Without LTO, IPRA is used at least 1 time in 19 SPEC benchmarks and with LTO in 26 SPEC benchmarks, out of 32 benchmarks in total.
  • Adding -fno-semantic-interposition does not affect the static data.
$ ./utils/compare.py -a -m ip-regalloc.NumCSROpt rva22_v_ipra_flto.json
Tests: 32
Metric: ip-regalloc.NumCSROpt

Program                                       ip-regalloc.NumCSROpt
                                              rva22_v_ipra_flto    
FP2017rate/526.blender_r/526.blender_r        239.00               
INT2017speed/602.gcc_s/602.gcc_s               61.00               
INT2017rate/502.gcc_r/502.gcc_r                61.00               
FP2017rate/510.parest_r/510.parest_r           48.00               
FP2017rate/511.povray_r/511.povray_r           41.00               
INT2017spe...23.xalancbmk_s/623.xalancbmk_s    24.00               
INT2017rat...23.xalancbmk_r/523.xalancbmk_r    24.00               
FP2017rate/538.imagick_r/538.imagick_r         23.00               
FP2017speed/638.imagick_s/638.imagick_s        23.00               
INT2017spe...00.perlbench_s/600.perlbench_s    21.00               
INT2017rat...00.perlbench_r/500.perlbench_r    21.00               
INT2017speed/641.leela_s/641.leela_s           12.00               
INT2017rate/541.leela_r/541.leela_r            12.00               
INT2017rat...31.deepsjeng_r/531.deepsjeng_r    11.00               
INT2017spe...31.deepsjeng_s/631.deepsjeng_s    11.00               
FP2017rate/508.namd_r/508.namd_r                9.00               
INT2017rate/520.omnetpp_r/520.omnetpp_r         6.00               
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s     6.00               
INT2017rate/525.x264_r/525.x264_r               5.00               
INT2017speed/625.x264_s/625.x264_s              5.00               
INT2017rate/557.xz_r/557.xz_r                   3.00               
INT2017speed/657.xz_s/657.xz_s                  3.00               
FP2017rate/519.lbm_r/519.lbm_r                  2.00               
FP2017speed/619.lbm_s/619.lbm_s                 2.00               
FP2017speed/644.nab_s/644.nab_s                 1.00               
FP2017rate/544.nab_r/544.nab_r                  1.00               
INT2017speed/605.mcf_s/605.mcf_s                                   
FP2017rate...97.specrand_fr/997.specrand_fr                        
INT2017rate/505.mcf_r/505.mcf_r                                    
INT2017rat...99.specrand_ir/999.specrand_ir                        
FP2017spee...96.specrand_fs/996.specrand_fs                        
INT2017spe...98.specrand_is/998.specrand_is                        
      ip-regalloc.NumCSROpt
run       rva22_v_ipra_flto
count  26.000000           
mean   25.961538           
std    46.851664           
min    1.000000            
25%    5.000000            
50%    11.500000           
75%    23.750000           
max    239.000000    

Dynamic Runtime Data

The static count of functions optimised is helpful for understanding the degree to which IPRA changes code generation at all, but runtime performance data is needed to evaluate the extent to which it helps performance.

In the following tables:

  • Prev is SPEC built with -march=rva22u64_v -fuse-ld=lld -O3
  • Current is SEPC built with -march=rva22u64_v -fuse-ld=lld -O3 -mllvm -enable-ipra -Wl,-mllvm,-enable-ipra

The data is available here: https://lnt.lukelau.me/db_default/v4/nts/192?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=194&baseline=192&submit=Update

Exec time:

Image

Code size:

Image

Appendix: separate compilation vs -flto

$ ./utils/compare.py -a rva22_v_ipra.json vs rva22_v_ipra_flto.json -m ip-regalloc.NumCSROpt
Tests: 32
Metric: ip-regalloc.NumCSROpt

Program                                       ip-regalloc.NumCSROpt               
                                              lhs                   rhs    diff   
FP2017rate/508.namd_r/508.namd_r                0.00                  9.00    inf%
INT2017rate/520.omnetpp_r/520.omnetpp_r         0.00                  6.00    inf%
INT2017speed/641.leela_s/641.leela_s            0.00                 12.00    inf%
FP2017rate/519.lbm_r/519.lbm_r                  0.00                  2.00    inf%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s     0.00                  6.00    inf%
INT2017rate/541.leela_r/541.leela_r             0.00                 12.00    inf%
FP2017speed/619.lbm_s/619.lbm_s                 0.00                  2.00    inf%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s     1.00                 24.00 2300.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r     1.00                 24.00 2300.0%
FP2017rate/510.parest_r/510.parest_r            6.00                 48.00  700.0%
FP2017rate/511.povray_r/511.povray_r            6.00                 41.00  583.3%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r     2.00                 11.00  450.0%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s     2.00                 11.00  450.0%
FP2017rate/526.blender_r/526.blender_r         67.00                239.00  256.7%
INT2017rate/557.xz_r/557.xz_r                   1.00                  3.00  200.0%
INT2017speed/657.xz_s/657.xz_s                  1.00                  3.00  200.0%
INT2017speed/602.gcc_s/602.gcc_s               24.00                 61.00  154.2%
INT2017rate/502.gcc_r/502.gcc_r                24.00                 61.00  154.2%
FP2017rate/538.imagick_r/538.imagick_r         11.00                 23.00  109.1%
FP2017speed/638.imagick_s/638.imagick_s        11.00                 23.00  109.1%
INT2017rat...00.perlbench_r/500.perlbench_r    14.00                 21.00   50.0%
INT2017spe...00.perlbench_s/600.perlbench_s    14.00                 21.00   50.0%
INT2017rate/525.x264_r/525.x264_r               4.00                  5.00   25.0%
INT2017speed/625.x264_s/625.x264_s              4.00                  5.00   25.0%
FP2017rate/544.nab_r/544.nab_r                  1.00                  1.00    0.0%
FP2017speed/644.nab_s/644.nab_s                 1.00                  1.00    0.0%
FP2017rate...97.specrand_fr/997.specrand_fr     0.00                  0.00        
FP2017spee...96.specrand_fs/996.specrand_fs     0.00                  0.00        
INT2017rate/505.mcf_r/505.mcf_r                 0.00                  0.00        
INT2017rat...99.specrand_ir/999.specrand_ir     0.00                  0.00        
INT2017speed/605.mcf_s/605.mcf_s                0.00                  0.00        
INT2017spe...98.specrand_is/998.specrand_is     0.00                  0.00        
                           Geomean difference                               215.6%                    
l/r                     lhs         rhs       diff
count  32.000000             32.000000   26.000000
mean   6.093750              21.093750   inf      
std    12.957398             43.315318  NaN       
min    0.000000              0.000000    0.000000 
25%    0.000000              1.750000    1.090909 
50%    1.000000              7.500000    3.533582 
75%    6.000000              23.000000  NaN       
max    67.000000             239.000000  inf      

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions