This repository contains the implementation of a small imperative language called MiniImp, developed as part of the Languages, Compilers and Interpreters course at the University of Pisa (A.Y. 2024/2025).
The project is split into two components:
- 🧠 Miniimp-Interpreter – a big-step interpreter that parses and executes MiniImp programs directly.
- ⚙️ miniimp_compiler – a compiler that translates MiniImp programs to a low‑level MiniRISC language, performs several analyses and optimizations, and emits linear MiniRISC code.
MY_MINIIMP_PROJECT/
│
├── Miniimp-Interpreter/
│ ├── core.ml
│ ├── main.ml
│ ├── miniimp_lexer.mll
│ ├── miniimp_parser.mly
│ ├── dune
│ ├── dune-project
│ └── tests/
│ ├── test1.miniimp
│ ├── test2.miniimp
│ ├── test3.miniimp
│ ├── test4.miniimp
│ ├── test5.miniimp
│ ├── test6.miniimp
│ └── test7.miniimp
│
└── miniimp_compiler/
├── core.ml
├── defined_variables.ml
├── ImpCFGtoRISCCFG.ml
├── liveness.ml
├── miniImpCFG.ml
├── miniRISC.ml
├── miniRISCCFGtoCode.ml
├── minirisc_allocator.ml
├── minirisc_code.ml
├── minirisc_optimizer.ml
├── main.ml
├── miniimp_lexer.mll
├── miniimp_parser.mly
├── dune
├── dune-project
└── tests/
├── test1.miniimp
├── test2.miniimp
├── test3.miniimp
├── test4.miniimp
├── test5.miniimp
├── test6.miniimp
└── test7.miniimp
MiniImp is a tiny imperative language supporting:
- Integer and boolean values
- Arithmetic operators:
+,-,*,/,% - Boolean operators:
and,or,not - Comparisons:
<,> - Variable assignment
- Sequencing with
; - Control flow:
if ... then ... else ...while ... do ...
A typical program has the form:
def main with input in output out as
out := (in + 3) * 2
The main block declares a single input variable and a single output variable and then runs a command.
The interpreter evaluates MiniImp programs directly using a big‑step semantics.
The core module defines:
- Abstract syntax
aexpfor arithmetic expressionsbexpfor boolean expressionsexpras a unified expression type wrappingaexpandbexpcomfor commands (Skip,Assign,Seq,If,While)programas aMainnode containing the input variable, output variable, and the main command
- Values and environment
IntVal of intandBoolVal of bool- The environment
envis a list of(variable, value)pairs lookupto fetch the current value of a variableupdateto bind / rebind a variable in the environment
- Evaluation functions
eval_aexpfor arithmetic expressionseval_bexpfor boolean expressionseval_comfor commands (updates the environment)eval_prgfor an entire program, starting from an initial value bound to the input variable
The lexer is written with OCamllex and turns the source text into tokens:
- Keywords:
def,main,input,output,skip,if,then,else,while,do - Symbols and operators:
+,-,*,/,%,<,>,(,),:=,; - Boolean operators:
and,or,not - Identifiers and integer literals
- Invalid characters raise a
LexingError.
The parser is implemented with Menhir. It builds an AST for programs, commands and expressions and carefully resolves typical grammar ambiguities:
-
Arithmetic precedence
*,/,%have higher precedence than+and-, and all are left‑associative:%left PLUS MINUS %left TIMES DIV MOD
so
x + y * zis parsed asx + (y * z). -
Boolean precedence
Comparisons (<,>) bind more tightly thanand/or, sox < y and z > wis parsed as(x < y) and (z > w). -
notprecedence
NOTis declared non‑associative and always applies to the following expression, avoiding parses like(NOT x) < y. -
Command sequencing
;is declared left‑associative to ensure left‑to‑right execution:%left SEMICOLON com: | c1 = com SEMICOLON c2 = com { Seq (c1, c2) }
-
Non‑associative comparisons
<and>are marked non‑associative to reject chains such asx < y < z. -
Dangling
else
ELSEis non‑associative and always attaches to the nearestif, removing the classic dangling‑else ambiguity. -
Parentheses
Parenthesised expressions have highest precedence and can override default rules.
The main executable:
- Asks the user whether the input value is an
intorbool. - Repeatedly reads and validates the input value.
- Loads and parses a
.miniimpfile into an AST. - Evaluates the program using
eval_prg. - Prints the final value of the output variable.
Example run (interpreter):
cd Miniimp-Interpreter
dune build
dune exec ./main.exe tests/test1.miniimpThe tests/ folder contains a set of MiniImp programs used during development:
- Basic arithmetic and assignment – e.g.
out := (in + 3) * 2 - Conditionals –
if in < 10 then ... else ... - While loops – factorial‑like computations using
while in > 0 do ... - Input validation – sequences of invalid
int/boolinputs to stress‑test error handling
You can open these files to see concrete language examples.
The compiler translates MiniImp programs into a low‑level MiniRISC representation, runs several static analyses and optimizations, and finally linearizes the control‑flow graph into executable MiniRISC code.
The main function in miniimp_compiler/main.ml orchestrates the following steps:
- Argument parsing – reads:
- number of hardware registers to use,
- whether to check for undefined variables,
- whether to enable optimization,
- input
.miniimpprogram path, - output
.miniriscfile path.
- Parsing MiniImp – reuses the lexer and parser to build the MiniImp AST.
- MiniImp CFG construction (
miniImpCFG.ml) – builds a maximal basic block control‑flow graph. - MiniRISC CFG translation (
ImpCFGtoRISCCFG.ml) – maps the MiniImp CFG to a MiniRISC CFG. - Defined‑variables analysis (
defined_variables.ml) – checks that no register is used before being defined. - Liveness analysis (
liveness.ml) – computes which registers are live at each point. - Register‑merging optimization (
minirisc_optimizer.ml) – greedily merges registers whose live ranges do not overlap. - Register allocation & spilling (
minirisc_allocator.ml) – decides which logical registers stay in hardware registers and which are spilled to memory. - CFG linearization (
miniRISCCFGtoCode.ml) – converts the MiniRISC CFG into a linear list of labeled basic blocks with explicit jumps. - Pretty‑printing (
minirisc_code.ml,miniRISC.ml) – prints the final MiniRISC program to the output file.
The CFG uses maximal basic blocks:
- A node
Block(id, commands)contains a maximal sequence of commands with no internal branching. - Directed edges
Flow(Block i, Block j)represent possible control transfers.
Design highlights:
- Sequences of simple assignments are grouped into a single block for efficiency.
ifstatements create a decision block and two successor blocks (then,else), which reconverge.whileloops are represented with:- a condition block,
- a body block,
- and an exit block with back‑edges modelling iteration.
Utility functions support pretty‑printing and inspection of the CFG when debugging.
The target MiniRISC language is a small RISC‑style instruction set with:
- Unary operations (
urop) – e.g.Not,Copy. - Binary operations (
brop) – arithmetic and logical operations on two registers. - Binary‑immediate operations (
biop) – operations mixing a register and an immediate integer. - Memory operations
Load,Storebetween registers and memory,LoadIto load an immediate constant into a register.
- Control‑flow instructions
Jump label– unconditional jump,CJump reg label_true label_false– conditional jump based on a register.
A MiniRISC program consists of:
- labeled basic blocks,
- a distinguished entry and exit label,
- and an explicit list of control‑flow edges.
Helper functions convert operations, commands, blocks and programs into readable strings.
Key ideas:
- Register assignment
r0is reserved for the input value,r1for the output,- general‑purpose registers start from
r2, - special temporaries
r101andr102are reserved for helper computations and spilling.
- Expression translation
- Arithmetic expressions:
- constants become
LoadI, - variables reuse their assigned registers,
- mixed register/immediate operations use
biopforms likeAddI.
- constants become
- Boolean expressions:
- booleans are encoded as
0/1, - logical ops map to
And,Or,Not, - comparisons like
<becomeBrop(Less, ...).
- booleans are encoded as
- Arithmetic expressions:
- Command translation
skip→Nop,- assignments evaluate the right‑hand side into the register mapped to the left‑hand variable,
- control structures are already lowered at the CFG level and finalized later through explicit jumps.
Each MiniImp basic block becomes a labeled MiniRISC block containing the translated commands.
This module implements a standard backward data‑flow analysis:
- For each block it computes:
in[b]– registers live at entry,out[b]– registers live at exit.
- Uses:
used_registers– registers read by an instruction,defined_registers– registers written by an instruction.
- Iterates over the CFG until a fixpoint is reached:
out[b]is the union ofinsets of successors,in[b] = used[b] ∪ (out[b] \ defined[b]).
The analysis result is later used by the optimizer and allocator.
This analysis ensures that no register is used before being defined:
- Tracks two sets per block:
in_set– registers definitely defined at block entry,out_set– registers defined at exit.
- The entry block starts with the input register (
r0) defined. - After reaching a fixpoint, a verification pass checks that every register in
used_registersbelongs to the currentin_set; otherwise an error is raised.
Goal: reduce the number of logical registers by merging those whose live ranges do not overlap.
- Computes live ranges for each register from the liveness information.
- Uses a greedy algorithm to assign multiple non‑overlapping registers to the same representative.
- Keeps special registers (
r_in,r_out, etc.) fixed. - Rewrites the program by applying the final mapping to every instruction.
Even when no merges are possible, the analysis provides insight into register pressure.
Given a limit on the number of physical registers:
- Counts how often each register is used throughout the CFG.
- Selects a keep set (registers remaining in hardware) based on usage frequency, always keeping
r0,r1and special temporaries. - The remaining registers are spilled:
- each spilled register is mapped to a dedicated memory slot (e.g. address
8001,8002, …), - instructions that read/write these registers are rewritten to include the necessary
Load/Storesequences, - helper registers
r101andr102are used to implement the loads/stores.
- each spilled register is mapped to a dedicated memory slot (e.g. address
The result is a MiniRISC CFG that respects the chosen hardware register budget.
Finally, the MiniRISC CFG is turned into a linear program:
- Each labeled block is output once, in an order compatible with the CFG.
- For each block, the module inspects its outgoing edges and emits:
- no jump for terminal blocks,
Jump lfor a single successor,CJumpplus fall‑through / jump for two successors.
- The final output is a sequence of labeled blocks ready to be run on a MiniRISC interpreter or simulator.
Both components use Dune as the build system.
- OCaml (e.g. via
opam) - Dune
cd Miniimp-Interpreter
# Build
dune build
# Run on one of the example programs
dune exec ./main.exe tests/test1.miniimpDuring execution the interpreter will:
- Ask whether the input value is
intorbool, - Read and validate the input,
- Evaluate the MiniImp program and print the final output.
cd miniimp_compiler
# Build
dune buildThe compiler executable expects:
dune exec ./main.exe <numOfRegisters> <CheckUndefinedVars> <EnableOptimization> <program.miniimp> <OutputPath>Example:
dune exec ./main.exe 5 true true ./tests/sum.miniimp ./Output/output.miniriscnumOfRegisters– maximum number of physical registers.CheckUndefinedVars–true/falseto enable defined‑variables analysis.EnableOptimization–true/falseto enable register‑merging optimization.program.miniimp– path to the MiniImp source.OutputPath– path where the final.miniriscprogram will be written.
This project was developed by Shahd Amer as part of the Languages, Compilers and Interpreters course in the Computer Science (Software Program) at the University of Pisa.
It showcases:
- parser and interpreter construction in OCaml,
- CFG‑based compilation,
- classic static analyses (liveness, defined‑variables),
- register allocation and spilling,
- and the translation from a structured imperative language to a low‑level RISC‑style IR.
Feel free to explore the tests and source code if you are interested in language implementation and compiler design.