Optimisations

This document outlines our implementation of the two bonus components for this project:

Intermediate code optimisation through control and data flow analysis
Register allocation and final code optimisation

All optimisations were carefully selected from the official LLVM documentation, with particular reference to the LLVM Passes guide. The following sections present code snippets from ast.cpp that implement these optimisations, along with explanations of their purpose and usage.

Intermediate code optimisation through control and data flow analysis

Initialisation

First, we initialise all required optimisation passes:

// Create a PassRegistry object to keep track of all passes
llvm::PassRegistry &Registry = *llvm::PassRegistry::getPassRegistry();

// Initialise IR optimisation passes (these are the O0/1/2/3 passes)
llvm::initializeCore(Registry);

Optimisation Levels

We implement four distinct optimisation levels (O0-O3) accessible via the -O<level> argument in our danac.sh compiler wrapper. Each level includes all optimisations from previous levels with additional enhancements.

Level 0 (`-O0`) - default level if no argument is passed

TheFPM->add(llvm::createPromoteMemoryToRegisterPass()); // mem2reg
TheFPM->add(llvm::createInstructionNamerPass());        // better names for debugging

Purpose and Benefits:

PromoteMemoryToRegisterPass: Transforms stack-allocated variables into register-based SSA (Static Single Assignment) form, eliminating unnecessary memory operations and improving performance
InstructionNamerPass: Enhances debuggability by generating human-readable variable names in the IR

Level 1 (`-O1`)

if(optLevel >= 1) {
    TheFPM->add(llvm::createCFGSimplificationPass());       // Control flow graph cleanup
    TheFPM->add(llvm::createEarlyCSEPass());                // Common subexpression elimination
    TheFPM->add(llvm::createLowerExpectIntrinsicPass());    // Branch prediction hints
    TheFPM->add(llvm::createReassociatePass());             // Expression reassociation
    TheFPM->add(llvm::createLoopRotatePass());              // Loop rotation
    TheFPM->add(llvm::createLICMPass());                    // Loop invariant code motion
    TheFPM->add(llvm::createInstructionCombiningPass());    // Instruction combining
    TheFPM->add(llvm::createCFGSimplificationPass());       // Additional CFG cleanup
    TheFPM->add(llvm::createTailCallEliminationPass());     // Tail call optimisation
}

Purpose and Benefits:

CFGSimplificationPass: Performs dead code elimination and basic block merging to reduce code complexity and make it easier for the rest of the passes to process the code
EarlyCSEPass: Eliminates redundant computations by preventing duplicate calculations of expressions
LowerExpectIntrinsicPass: Converts the llvm's intrinsics into branch weight metadata so it helps the processor increase the branch prediction accuracy
ReassociatePass: Reassociates expressions in a way that is easier for the LICM pass to process them
LoopRotatePass: Converts loops into do/while loops to enable better loop optimisation
LICMPass: Performs Loop Invariant Code Motion, attempting to remove as much code from the body of a loop as possible. With this way, it eliminates redundant calculations within loop bodies
InstructionCombiningPass: Combines instructions to form simpler ones e.g algebraic simplification for efficiency
TailCallEliminationPass: Tries to convert self recursion functions into loops to reduce overhead

Level 2 (`-O2`)

if(optLevel >= 2) {
    TheFPM->add(llvm::createGVNPass());                      // Global Value Numbering
    TheFPM->add(llvm::createSCCPPass());                     // Sparse Conditional Constant Propagation
    TheFPM->add(llvm::createAggressiveDCEPass());            // Dead Code Elimination
    TheFPM->add(llvm::createInstructionCombiningPass());     // Instruction combining
    TheFPM->add(llvm::createCFGSimplificationPass());        // Control flow graph cleanup
}

Purpose and Benefits:

GVNPass: Eliminates fully and partially redundant instructions and rudundant loads
SCCPPass: Propagates constant values through the control flow graph, even across conditional branches enabling more dead code elimination
AggressiveDCEPass: Performs dead code elimination more aggresively than CFGSimplificationPass and is recommended by the docs after SCCPPass

Level 3 (`-O3`)

Level 3 represents our most aggressive optimisation tier, focusing on advanced loop transformations and maximal code performance. These optimisations build upon all previous levels to extract the highest possible execution speed, particularly for compute-intensive workloads.

if (optLevel >= 3) {
    TheFPM->add(llvm::createLoopUnrollAndJamPass(false));      // Loop unrolling and Jammin
    TheFPM->add(llvm::createInstructionCombiningPass());       // Instruction combining
    TheFPM->add(llvm::createCFGSimplificationPass());          // Control flow graph cleanup
}

Purpose and Benefits:

LoopUnrollAndJamPass: Performs two complementary loop transformations: 1) Replicates loop bodies to reduce branch overhead and 2) fuses adjacent loops to improve data locality and cache utilization. The false argument disables runtime unrolling heuristics

Apply optimisations

We apply all the passes, of the input level, in each function separately:

for(llvm::Function &F : *TheModule) {
    TheFPM->run(F);
}

Register allocation and final code optimisation

LLVM handles internally the register allocation and the final code optimisation. All we have to do is to initilise a TargetMachine object to enable platform-specific optimisations. We also use the machines DataLayout to give LLVM information about the target's data layout e.g endianess, pointer sizes, etc.

llvm::TargetOptions TO;
std::unique_ptr<llvm::TargetMachine> TM(
    T->createTargetMachine(TargetTriple, "generic", "", TO, llvm::Reloc::PIC_)
);
TheModule->setDataLayout(TM->createDataLayout());

Next, we have to initialise LLVM's backend passes. The default register allocator is Greedy, a basic graph-coloring-style allocation and we do not change it.

llvm::initializeCodeGen(Registry);

In order to apply all of the above, we are using TargetMachine's addPassesToEmitFile function. The default compiler mode generates IR, ASM and object files. The implementations are simillar, thus here, we show only the implementation for the optimised final code(Assembly) generation:

std::error_code EC;
llvm::raw_fd_ostream asmFile("output_opt.asm", EC, llvm::sys::fs::OF_Text);
if(EC) { 
    llvm::errs() << "ASM file open error: " << EC.message() << "\n"; 
    exit(1); 
}
           
llvm::legacy::PassManager PM;
if(TM->addPassesToEmitFile(PM, asmFile, nullptr, llvm::CGFT_AssemblyFile)) {
    llvm::errs() << "Cannot emit assembly for this target\n"; 
    exit(1);
}
PM.run(*TheModule);

Runtime Comparison of Optimized Programs

The following image generated with

python3 testing/test_compiler.py

illustrates the execution times of the programs under all optimisation levels implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisations

Intermediate code optimisation through control and data flow analysis

Initialisation

Optimisation Levels

Level 0 (`-O0`) - default level if no argument is passed

Level 1 (`-O1`)

Level 2 (`-O2`)

Level 3 (`-O3`)

Apply optimisations

Register allocation and final code optimisation

Runtime Comparison of Optimized Programs

FilesExpand file tree

OPTIMISATIONS.md

Latest commit

History

OPTIMISATIONS.md

File metadata and controls

Optimisations

Intermediate code optimisation through control and data flow analysis

Initialisation

Optimisation Levels

Level 0 (-O0) - default level if no argument is passed

Level 1 (-O1)

Level 2 (-O2)

Level 3 (-O3)

Apply optimisations

Register allocation and final code optimisation

Runtime Comparison of Optimized Programs

Level 0 (`-O0`) - default level if no argument is passed

Level 1 (`-O1`)

Level 2 (`-O2`)

Level 3 (`-O3`)