This repository contains the implementation of a small imperative language called MiniImp, developed as part of the Languages, Compilers and Interpreters course at the University of Pisa (A.Y. 2024/2025).
The project is split into two components:
- đ§ Miniimp-Interpreter â a big-step interpreter that parses and executes MiniImp programs directly.
- âď¸ miniimp_compiler â a compiler that translates MiniImp programs to a lowâlevel MiniRISC language, performs several analyses and optimizations, and emits linear MiniRISC code.
MY_MINIIMP_PROJECT/
â
âââ Miniimp-Interpreter/
â âââ core.ml
â âââ main.ml
â âââ miniimp_lexer.mll
â âââ miniimp_parser.mly
â âââ dune
â âââ dune-project
â âââ tests/
â âââ test1.miniimp
â âââ test2.miniimp
â âââ test3.miniimp
â âââ test4.miniimp
â âââ test5.miniimp
â âââ test6.miniimp
â âââ test7.miniimp
â
âââ miniimp_compiler/
âââ core.ml
âââ defined_variables.ml
âââ ImpCFGtoRISCCFG.ml
âââ liveness.ml
âââ miniImpCFG.ml
âââ miniRISC.ml
âââ miniRISCCFGtoCode.ml
âââ minirisc_allocator.ml
âââ minirisc_code.ml
âââ minirisc_optimizer.ml
âââ main.ml
âââ miniimp_lexer.mll
âââ miniimp_parser.mly
âââ dune
âââ dune-project
âââ tests/
âââ test1.miniimp
âââ test2.miniimp
âââ test3.miniimp
âââ test4.miniimp
âââ test5.miniimp
âââ test6.miniimp
âââ test7.miniimp
MiniImp is a tiny imperative language supporting:
- Integer and boolean values
- Arithmetic operators:
+,-,*,/,% - Boolean operators:
and,or,not - Comparisons:
<,> - Variable assignment
- Sequencing with
; - Control flow:
if ... then ... else ...while ... do ...
A typical program has the form:
def main with input in output out as
out := (in + 3) * 2
The main block declares a single input variable and a single output variable and then runs a command.
The interpreter evaluates MiniImp programs directly using a bigâstep semantics.
The core module defines:
- Abstract syntax
aexpfor arithmetic expressionsbexpfor boolean expressionsexpras a unified expression type wrappingaexpandbexpcomfor commands (Skip,Assign,Seq,If,While)programas aMainnode containing the input variable, output variable, and the main command
- Values and environment
IntVal of intandBoolVal of bool- The environment
envis a list of(variable, value)pairs lookupto fetch the current value of a variableupdateto bind / rebind a variable in the environment
- Evaluation functions
eval_aexpfor arithmetic expressionseval_bexpfor boolean expressionseval_comfor commands (updates the environment)eval_prgfor an entire program, starting from an initial value bound to the input variable
The lexer is written with OCamllex and turns the source text into tokens:
- Keywords:
def,main,input,output,skip,if,then,else,while,do - Symbols and operators:
+,-,*,/,%,<,>,(,),:=,; - Boolean operators:
and,or,not - Identifiers and integer literals
- Invalid characters raise a
LexingError.
The parser is implemented with Menhir. It builds an AST for programs, commands and expressions and carefully resolves typical grammar ambiguities:
-
Arithmetic precedence
*,/,%have higher precedence than+and-, and all are leftâassociative:%left PLUS MINUS %left TIMES DIV MOD
so
x + y * zis parsed asx + (y * z). -
Boolean precedence
Comparisons (<,>) bind more tightly thanand/or, sox < y and z > wis parsed as(x < y) and (z > w). -
notprecedence
NOTis declared nonâassociative and always applies to the following expression, avoiding parses like(NOT x) < y. -
Command sequencing
;is declared leftâassociative to ensure leftâtoâright execution:%left SEMICOLON com: | c1 = com SEMICOLON c2 = com { Seq (c1, c2) }
-
Nonâassociative comparisons
<and>are marked nonâassociative to reject chains such asx < y < z. -
Dangling
else
ELSEis nonâassociative and always attaches to the nearestif, removing the classic danglingâelse ambiguity. -
Parentheses
Parenthesised expressions have highest precedence and can override default rules.
The main executable:
- Asks the user whether the input value is an
intorbool. - Repeatedly reads and validates the input value.
- Loads and parses a
.miniimpfile into an AST. - Evaluates the program using
eval_prg. - Prints the final value of the output variable.
Example run (interpreter):
cd Miniimp-Interpreter
dune build
dune exec ./main.exe tests/test1.miniimpThe tests/ folder contains a set of MiniImp programs used during development:
- Basic arithmetic and assignment â e.g.
out := (in + 3) * 2 - Conditionals â
if in < 10 then ... else ... - While loops â factorialâlike computations using
while in > 0 do ... - Input validation â sequences of invalid
int/boolinputs to stressâtest error handling
You can open these files to see concrete language examples.
The compiler translates MiniImp programs into a lowâlevel MiniRISC representation, runs several static analyses and optimizations, and finally linearizes the controlâflow graph into executable MiniRISC code.
The main function in miniimp_compiler/main.ml orchestrates the following steps:
- Argument parsing â reads:
- number of hardware registers to use,
- whether to check for undefined variables,
- whether to enable optimization,
- input
.miniimpprogram path, - output
.miniriscfile path.
- Parsing MiniImp â reuses the lexer and parser to build the MiniImp AST.
- MiniImp CFG construction (
miniImpCFG.ml) â builds a maximal basic block controlâflow graph. - MiniRISC CFG translation (
ImpCFGtoRISCCFG.ml) â maps the MiniImp CFG to a MiniRISC CFG. - Definedâvariables analysis (
defined_variables.ml) â checks that no register is used before being defined. - Liveness analysis (
liveness.ml) â computes which registers are live at each point. - Registerâmerging optimization (
minirisc_optimizer.ml) â greedily merges registers whose live ranges do not overlap. - Register allocation & spilling (
minirisc_allocator.ml) â decides which logical registers stay in hardware registers and which are spilled to memory. - CFG linearization (
miniRISCCFGtoCode.ml) â converts the MiniRISC CFG into a linear list of labeled basic blocks with explicit jumps. - Prettyâprinting (
minirisc_code.ml,miniRISC.ml) â prints the final MiniRISC program to the output file.
The CFG uses maximal basic blocks:
- A node
Block(id, commands)contains a maximal sequence of commands with no internal branching. - Directed edges
Flow(Block i, Block j)represent possible control transfers.
Design highlights:
- Sequences of simple assignments are grouped into a single block for efficiency.
ifstatements create a decision block and two successor blocks (then,else), which reconverge.whileloops are represented with:- a condition block,
- a body block,
- and an exit block with backâedges modelling iteration.
Utility functions support prettyâprinting and inspection of the CFG when debugging.
The target MiniRISC language is a small RISCâstyle instruction set with:
- Unary operations (
urop) â e.g.Not,Copy. - Binary operations (
brop) â arithmetic and logical operations on two registers. - Binaryâimmediate operations (
biop) â operations mixing a register and an immediate integer. - Memory operations
Load,Storebetween registers and memory,LoadIto load an immediate constant into a register.
- Controlâflow instructions
Jump labelâ unconditional jump,CJump reg label_true label_falseâ conditional jump based on a register.
A MiniRISC program consists of:
- labeled basic blocks,
- a distinguished entry and exit label,
- and an explicit list of controlâflow edges.
Helper functions convert operations, commands, blocks and programs into readable strings.
Key ideas:
- Register assignment
r0is reserved for the input value,r1for the output,- generalâpurpose registers start from
r2, - special temporaries
r101andr102are reserved for helper computations and spilling.
- Expression translation
- Arithmetic expressions:
- constants become
LoadI, - variables reuse their assigned registers,
- mixed register/immediate operations use
biopforms likeAddI.
- constants become
- Boolean expressions:
- booleans are encoded as
0/1, - logical ops map to
And,Or,Not, - comparisons like
<becomeBrop(Less, ...).
- booleans are encoded as
- Arithmetic expressions:
- Command translation
skipâNop,- assignments evaluate the rightâhand side into the register mapped to the leftâhand variable,
- control structures are already lowered at the CFG level and finalized later through explicit jumps.
Each MiniImp basic block becomes a labeled MiniRISC block containing the translated commands.
This module implements a standard backward dataâflow analysis:
- For each block it computes:
in[b]â registers live at entry,out[b]â registers live at exit.
- Uses:
used_registersâ registers read by an instruction,defined_registersâ registers written by an instruction.
- Iterates over the CFG until a fixpoint is reached:
out[b]is the union ofinsets of successors,in[b] = used[b] ⪠(out[b] \ defined[b]).
The analysis result is later used by the optimizer and allocator.
This analysis ensures that no register is used before being defined:
- Tracks two sets per block:
in_setâ registers definitely defined at block entry,out_setâ registers defined at exit.
- The entry block starts with the input register (
r0) defined. - After reaching a fixpoint, a verification pass checks that every register in
used_registersbelongs to the currentin_set; otherwise an error is raised.
Goal: reduce the number of logical registers by merging those whose live ranges do not overlap.
- Computes live ranges for each register from the liveness information.
- Uses a greedy algorithm to assign multiple nonâoverlapping registers to the same representative.
- Keeps special registers (
r_in,r_out, etc.) fixed. - Rewrites the program by applying the final mapping to every instruction.
Even when no merges are possible, the analysis provides insight into register pressure.
Given a limit on the number of physical registers:
- Counts how often each register is used throughout the CFG.
- Selects a keep set (registers remaining in hardware) based on usage frequency, always keeping
r0,r1and special temporaries. - The remaining registers are spilled:
- each spilled register is mapped to a dedicated memory slot (e.g. address
8001,8002, âŚ), - instructions that read/write these registers are rewritten to include the necessary
Load/Storesequences, - helper registers
r101andr102are used to implement the loads/stores.
- each spilled register is mapped to a dedicated memory slot (e.g. address
The result is a MiniRISC CFG that respects the chosen hardware register budget.
Finally, the MiniRISC CFG is turned into a linear program:
- Each labeled block is output once, in an order compatible with the CFG.
- For each block, the module inspects its outgoing edges and emits:
- no jump for terminal blocks,
Jump lfor a single successor,CJumpplus fallâthrough / jump for two successors.
- The final output is a sequence of labeled blocks ready to be run on a MiniRISC interpreter or simulator.
Both components use Dune as the build system.
- OCaml (e.g. via
opam) - Dune
cd Miniimp-Interpreter
# Build
dune build
# Run on one of the example programs
dune exec ./main.exe tests/test1.miniimpDuring execution the interpreter will:
- Ask whether the input value is
intorbool, - Read and validate the input,
- Evaluate the MiniImp program and print the final output.
cd miniimp_compiler
# Build
dune buildThe compiler executable expects:
dune exec ./main.exe <numOfRegisters> <CheckUndefinedVars> <EnableOptimization> <program.miniimp> <OutputPath>Example:
dune exec ./main.exe 5 true true ./tests/sum.miniimp ./Output/output.miniriscnumOfRegistersâ maximum number of physical registers.CheckUndefinedVarsâtrue/falseto enable definedâvariables analysis.EnableOptimizationâtrue/falseto enable registerâmerging optimization.program.miniimpâ path to the MiniImp source.OutputPathâ path where the final.miniriscprogram will be written.
This project was developed by Shahd Amer as part of the Languages, Compilers and Interpreters course in the Computer Science (Software Program) at the University of Pisa.
It showcases:
- parser and interpreter construction in OCaml,
- CFGâbased compilation,
- classic static analyses (liveness, definedâvariables),
- register allocation and spilling,
- and the translation from a structured imperative language to a lowâlevel RISCâstyle IR.
Feel free to explore the tests and source code if you are interested in language implementation and compiler design.