This document outlines our implementation of the two bonus components for this project:
- Intermediate code optimisation through control and data flow analysis
- Register allocation and final code optimisation
All optimisations were carefully selected from the official LLVM documentation, with particular reference to the LLVM Passes guide. The following sections present code snippets from ast.cpp that implement these optimisations, along with explanations of their purpose and usage.
First, we initialise all required optimisation passes:
// Create a PassRegistry object to keep track of all passes
llvm::PassRegistry &Registry = *llvm::PassRegistry::getPassRegistry();
// Initialise IR optimisation passes (these are the O0/1/2/3 passes)
llvm::initializeCore(Registry);We implement four distinct optimisation levels (O0-O3) accessible via the -O<level> argument in our danac.sh compiler wrapper. Each level includes all optimisations from previous levels with additional enhancements.
TheFPM->add(llvm::createPromoteMemoryToRegisterPass()); // mem2reg
TheFPM->add(llvm::createInstructionNamerPass()); // better names for debuggingPurpose and Benefits:
PromoteMemoryToRegisterPass: Transforms stack-allocated variables into register-based SSA (Static Single Assignment) form, eliminating unnecessary memory operations and improving performanceInstructionNamerPass: Enhances debuggability by generating human-readable variable names in the IR
if(optLevel >= 1) {
TheFPM->add(llvm::createCFGSimplificationPass()); // Control flow graph cleanup
TheFPM->add(llvm::createEarlyCSEPass()); // Common subexpression elimination
TheFPM->add(llvm::createLowerExpectIntrinsicPass()); // Branch prediction hints
TheFPM->add(llvm::createReassociatePass()); // Expression reassociation
TheFPM->add(llvm::createLoopRotatePass()); // Loop rotation
TheFPM->add(llvm::createLICMPass()); // Loop invariant code motion
TheFPM->add(llvm::createInstructionCombiningPass()); // Instruction combining
TheFPM->add(llvm::createCFGSimplificationPass()); // Additional CFG cleanup
TheFPM->add(llvm::createTailCallEliminationPass()); // Tail call optimisation
}Purpose and Benefits:
CFGSimplificationPass: Performs dead code elimination and basic block merging to reduce code complexity and make it easier for the rest of the passes to process the codeEarlyCSEPass: Eliminates redundant computations by preventing duplicate calculations of expressionsLowerExpectIntrinsicPass: Converts the llvm's intrinsics into branch weight metadata so it helps the processor increase the branch prediction accuracyReassociatePass: Reassociates expressions in a way that is easier for the LICM pass to process themLoopRotatePass: Converts loops into do/while loops to enable better loop optimisationLICMPass: Performs Loop Invariant Code Motion, attempting to remove as much code from the body of a loop as possible. With this way, it eliminates redundant calculations within loop bodiesInstructionCombiningPass: Combines instructions to form simpler ones e.g algebraic simplification for efficiencyTailCallEliminationPass: Tries to convert self recursion functions into loops to reduce overhead
if(optLevel >= 2) {
TheFPM->add(llvm::createGVNPass()); // Global Value Numbering
TheFPM->add(llvm::createSCCPPass()); // Sparse Conditional Constant Propagation
TheFPM->add(llvm::createAggressiveDCEPass()); // Dead Code Elimination
TheFPM->add(llvm::createInstructionCombiningPass()); // Instruction combining
TheFPM->add(llvm::createCFGSimplificationPass()); // Control flow graph cleanup
}Purpose and Benefits:
GVNPass: Eliminates fully and partially redundant instructions and rudundant loadsSCCPPass: Propagates constant values through the control flow graph, even across conditional branches enabling more dead code eliminationAggressiveDCEPass: Performs dead code elimination more aggresively than CFGSimplificationPass and is recommended by the docs after SCCPPass
Level 3 represents our most aggressive optimisation tier, focusing on advanced loop transformations and maximal code performance. These optimisations build upon all previous levels to extract the highest possible execution speed, particularly for compute-intensive workloads.
if (optLevel >= 3) {
TheFPM->add(llvm::createLoopUnrollAndJamPass(false)); // Loop unrolling and Jammin
TheFPM->add(llvm::createInstructionCombiningPass()); // Instruction combining
TheFPM->add(llvm::createCFGSimplificationPass()); // Control flow graph cleanup
}Purpose and Benefits:
LoopUnrollAndJamPass: Performs two complementary loop transformations: 1) Replicates loop bodies to reduce branch overhead and 2) fuses adjacent loops to improve data locality and cache utilization. Thefalseargument disables runtime unrolling heuristics
We apply all the passes, of the input level, in each function separately:
for(llvm::Function &F : *TheModule) {
TheFPM->run(F);
}LLVM handles internally the register allocation and the final code optimisation. All we have to do is to initilise a TargetMachine object to enable platform-specific optimisations. We also use the machines DataLayout to give LLVM information about the target's data layout e.g endianess, pointer sizes, etc.
llvm::TargetOptions TO;
std::unique_ptr<llvm::TargetMachine> TM(
T->createTargetMachine(TargetTriple, "generic", "", TO, llvm::Reloc::PIC_)
);
TheModule->setDataLayout(TM->createDataLayout());Next, we have to initialise LLVM's backend passes. The default register allocator is Greedy, a basic graph-coloring-style allocation and we do not change it.
llvm::initializeCodeGen(Registry);In order to apply all of the above, we are using TargetMachine's addPassesToEmitFile function. The default compiler mode generates IR, ASM and object files. The implementations are simillar, thus here, we show only the implementation for the optimised final code(Assembly) generation:
std::error_code EC;
llvm::raw_fd_ostream asmFile("output_opt.asm", EC, llvm::sys::fs::OF_Text);
if(EC) {
llvm::errs() << "ASM file open error: " << EC.message() << "\n";
exit(1);
}
llvm::legacy::PassManager PM;
if(TM->addPassesToEmitFile(PM, asmFile, nullptr, llvm::CGFT_AssemblyFile)) {
llvm::errs() << "Cannot emit assembly for this target\n";
exit(1);
}
PM.run(*TheModule);The following image generated with
python3 testing/test_compiler.pyillustrates the execution times of the programs under all optimisation levels implemented.
