Skip to content

Latest commit

 

History

History
408 lines (303 loc) · 14.6 KB

File metadata and controls

408 lines (303 loc) · 14.6 KB

MiniImp Language — Interpreter & Compiler (OCaml)

This repository contains the implementation of a small imperative language called MiniImp, developed as part of the Languages, Compilers and Interpreters course at the University of Pisa (A.Y. 2024/2025).

The project is split into two components:

  • 🧠 Miniimp-Interpreter – a big-step interpreter that parses and executes MiniImp programs directly.
  • ⚙️ miniimp_compiler – a compiler that translates MiniImp programs to a low‑level MiniRISC language, performs several analyses and optimizations, and emits linear MiniRISC code.

🗂 Repository Structure

MY_MINIIMP_PROJECT/
│
├── Miniimp-Interpreter/
│   ├── core.ml
│   ├── main.ml
│   ├── miniimp_lexer.mll
│   ├── miniimp_parser.mly
│   ├── dune
│   ├── dune-project
│   └── tests/
│       ├── test1.miniimp
│       ├── test2.miniimp
│       ├── test3.miniimp
│       ├── test4.miniimp
│       ├── test5.miniimp
│       ├── test6.miniimp
│       └── test7.miniimp
│
└── miniimp_compiler/
    ├── core.ml
    ├── defined_variables.ml
    ├── ImpCFGtoRISCCFG.ml
    ├── liveness.ml
    ├── miniImpCFG.ml
    ├── miniRISC.ml
    ├── miniRISCCFGtoCode.ml
    ├── minirisc_allocator.ml
    ├── minirisc_code.ml
    ├── minirisc_optimizer.ml
    ├── main.ml
    ├── miniimp_lexer.mll
    ├── miniimp_parser.mly
    ├── dune
    ├── dune-project
    └── tests/
        ├── test1.miniimp
        ├── test2.miniimp
        ├── test3.miniimp
        ├── test4.miniimp
        ├── test5.miniimp
        ├── test6.miniimp
        └── test7.miniimp

💡 MiniImp Language Overview

MiniImp is a tiny imperative language supporting:

  • Integer and boolean values
  • Arithmetic operators: +, -, *, /, %
  • Boolean operators: and, or, not
  • Comparisons: <, >
  • Variable assignment
  • Sequencing with ;
  • Control flow:
    • if ... then ... else ...
    • while ... do ...

A typical program has the form:

def main with input in output out as
  out := (in + 3) * 2

The main block declares a single input variable and a single output variable and then runs a command.


🧠 Miniimp-Interpreter

The interpreter evaluates MiniImp programs directly using a big‑step semantics.

Core module (core.ml)

The core module defines:

  • Abstract syntax
    • aexp for arithmetic expressions
    • bexp for boolean expressions
    • expr as a unified expression type wrapping aexp and bexp
    • com for commands (Skip, Assign, Seq, If, While)
    • program as a Main node containing the input variable, output variable, and the main command
  • Values and environment
    • IntVal of int and BoolVal of bool
    • The environment env is a list of (variable, value) pairs
    • lookup to fetch the current value of a variable
    • update to bind / rebind a variable in the environment
  • Evaluation functions
    • eval_aexp for arithmetic expressions
    • eval_bexp for boolean expressions
    • eval_com for commands (updates the environment)
    • eval_prg for an entire program, starting from an initial value bound to the input variable

Lexer (miniimp_lexer.mll)

The lexer is written with OCamllex and turns the source text into tokens:

  • Keywords: def, main, input, output, skip, if, then, else, while, do
  • Symbols and operators: +, -, *, /, %, <, >, (, ), :=, ;
  • Boolean operators: and, or, not
  • Identifiers and integer literals
  • Invalid characters raise a LexingError.

Parser (miniimp_parser.mly)

The parser is implemented with Menhir. It builds an AST for programs, commands and expressions and carefully resolves typical grammar ambiguities:

  • Arithmetic precedence
    *, /, % have higher precedence than + and -, and all are left‑associative:

    %left PLUS MINUS
    %left TIMES DIV MOD

    so x + y * z is parsed as x + (y * z).

  • Boolean precedence
    Comparisons (<, >) bind more tightly than and / or, so x < y and z > w is parsed as (x < y) and (z > w).

  • not precedence
    NOT is declared non‑associative and always applies to the following expression, avoiding parses like (NOT x) < y.

  • Command sequencing
    ; is declared left‑associative to ensure left‑to‑right execution:

    %left SEMICOLON
    com:
      | c1 = com SEMICOLON c2 = com { Seq (c1, c2) }
  • Non‑associative comparisons
    < and > are marked non‑associative to reject chains such as x < y < z.

  • Dangling else
    ELSE is non‑associative and always attaches to the nearest if, removing the classic dangling‑else ambiguity.

  • Parentheses
    Parenthesised expressions have highest precedence and can override default rules.

Main program (main.ml)

The main executable:

  1. Asks the user whether the input value is an int or bool.
  2. Repeatedly reads and validates the input value.
  3. Loads and parses a .miniimp file into an AST.
  4. Evaluates the program using eval_prg.
  5. Prints the final value of the output variable.

Example run (interpreter):

cd Miniimp-Interpreter
dune build
dune exec ./main.exe tests/test1.miniimp

Tests

The tests/ folder contains a set of MiniImp programs used during development:

  • Basic arithmetic and assignment – e.g. out := (in + 3) * 2
  • Conditionalsif in < 10 then ... else ...
  • While loops – factorial‑like computations using while in > 0 do ...
  • Input validation – sequences of invalid int / bool inputs to stress‑test error handling

You can open these files to see concrete language examples.


⚙️ miniimp_compiler

The compiler translates MiniImp programs into a low‑level MiniRISC representation, runs several static analyses and optimizations, and finally linearizes the control‑flow graph into executable MiniRISC code.

High‑level compilation pipeline

The main function in miniimp_compiler/main.ml orchestrates the following steps:

  1. Argument parsing – reads:
    • number of hardware registers to use,
    • whether to check for undefined variables,
    • whether to enable optimization,
    • input .miniimp program path,
    • output .minirisc file path.
  2. Parsing MiniImp – reuses the lexer and parser to build the MiniImp AST.
  3. MiniImp CFG construction (miniImpCFG.ml) – builds a maximal basic block control‑flow graph.
  4. MiniRISC CFG translation (ImpCFGtoRISCCFG.ml) – maps the MiniImp CFG to a MiniRISC CFG.
  5. Defined‑variables analysis (defined_variables.ml) – checks that no register is used before being defined.
  6. Liveness analysis (liveness.ml) – computes which registers are live at each point.
  7. Register‑merging optimization (minirisc_optimizer.ml) – greedily merges registers whose live ranges do not overlap.
  8. Register allocation & spilling (minirisc_allocator.ml) – decides which logical registers stay in hardware registers and which are spilled to memory.
  9. CFG linearization (miniRISCCFGtoCode.ml) – converts the MiniRISC CFG into a linear list of labeled basic blocks with explicit jumps.
  10. Pretty‑printing (minirisc_code.ml, miniRISC.ml) – prints the final MiniRISC program to the output file.

Control‑Flow Graph for MiniImp (miniImpCFG.ml)

The CFG uses maximal basic blocks:

  • A node Block(id, commands) contains a maximal sequence of commands with no internal branching.
  • Directed edges Flow(Block i, Block j) represent possible control transfers.

Design highlights:

  • Sequences of simple assignments are grouped into a single block for efficiency.
  • if statements create a decision block and two successor blocks (then, else), which reconverge.
  • while loops are represented with:
    • a condition block,
    • a body block,
    • and an exit block with back‑edges modelling iteration.

Utility functions support pretty‑printing and inspection of the CFG when debugging.

MiniRISC language (miniRISC.ml)

The target MiniRISC language is a small RISC‑style instruction set with:

  • Unary operations (urop) – e.g. Not, Copy.
  • Binary operations (brop) – arithmetic and logical operations on two registers.
  • Binary‑immediate operations (biop) – operations mixing a register and an immediate integer.
  • Memory operations
    • Load, Store between registers and memory,
    • LoadI to load an immediate constant into a register.
  • Control‑flow instructions
    • Jump label – unconditional jump,
    • CJump reg label_true label_false – conditional jump based on a register.

A MiniRISC program consists of:

  • labeled basic blocks,
  • a distinguished entry and exit label,
  • and an explicit list of control‑flow edges.

Helper functions convert operations, commands, blocks and programs into readable strings.

Translation from MiniImp CFG to MiniRISC CFG (ImpCFGtoRISCCFG.ml)

Key ideas:

  • Register assignment
    • r0 is reserved for the input value,
    • r1 for the output,
    • general‑purpose registers start from r2,
    • special temporaries r101 and r102 are reserved for helper computations and spilling.
  • Expression translation
    • Arithmetic expressions:
      • constants become LoadI,
      • variables reuse their assigned registers,
      • mixed register/immediate operations use biop forms like AddI.
    • Boolean expressions:
      • booleans are encoded as 0 / 1,
      • logical ops map to And, Or, Not,
      • comparisons like < become Brop(Less, ...).
  • Command translation
    • skipNop,
    • assignments evaluate the right‑hand side into the register mapped to the left‑hand variable,
    • control structures are already lowered at the CFG level and finalized later through explicit jumps.

Each MiniImp basic block becomes a labeled MiniRISC block containing the translated commands.

Liveness analysis (liveness.ml)

This module implements a standard backward data‑flow analysis:

  • For each block it computes:
    • in[b] – registers live at entry,
    • out[b] – registers live at exit.
  • Uses:
    • used_registers – registers read by an instruction,
    • defined_registers – registers written by an instruction.
  • Iterates over the CFG until a fixpoint is reached:
    • out[b] is the union of in sets of successors,
    • in[b] = used[b] ∪ (out[b] \ defined[b]).

The analysis result is later used by the optimizer and allocator.

Defined‑variables analysis (defined_variables.ml)

This analysis ensures that no register is used before being defined:

  • Tracks two sets per block:
    • in_set – registers definitely defined at block entry,
    • out_set – registers defined at exit.
  • The entry block starts with the input register (r0) defined.
  • After reaching a fixpoint, a verification pass checks that every register in used_registers belongs to the current in_set; otherwise an error is raised.

Register‑merging optimization (minirisc_optimizer.ml)

Goal: reduce the number of logical registers by merging those whose live ranges do not overlap.

  • Computes live ranges for each register from the liveness information.
  • Uses a greedy algorithm to assign multiple non‑overlapping registers to the same representative.
  • Keeps special registers (r_in, r_out, etc.) fixed.
  • Rewrites the program by applying the final mapping to every instruction.

Even when no merges are possible, the analysis provides insight into register pressure.

Register allocation and spilling (minirisc_allocator.ml)

Given a limit on the number of physical registers:

  • Counts how often each register is used throughout the CFG.
  • Selects a keep set (registers remaining in hardware) based on usage frequency, always keeping r0, r1 and special temporaries.
  • The remaining registers are spilled:
    • each spilled register is mapped to a dedicated memory slot (e.g. address 8001, 8002, …),
    • instructions that read/write these registers are rewritten to include the necessary Load / Store sequences,
    • helper registers r101 and r102 are used to implement the loads/stores.

The result is a MiniRISC CFG that respects the chosen hardware register budget.

CFG linearization (miniRISCCFGtoCode.ml)

Finally, the MiniRISC CFG is turned into a linear program:

  • Each labeled block is output once, in an order compatible with the CFG.
  • For each block, the module inspects its outgoing edges and emits:
    • no jump for terminal blocks,
    • Jump l for a single successor,
    • CJump plus fall‑through / jump for two successors.
  • The final output is a sequence of labeled blocks ready to be run on a MiniRISC interpreter or simulator.

🏃 How to Build and Run

Both components use Dune as the build system.

Requirements

  • OCaml (e.g. via opam)
  • Dune

Interpreter

cd Miniimp-Interpreter

# Build
dune build

# Run on one of the example programs
dune exec ./main.exe tests/test1.miniimp

During execution the interpreter will:

  1. Ask whether the input value is int or bool,
  2. Read and validate the input,
  3. Evaluate the MiniImp program and print the final output.

Compiler

cd miniimp_compiler

# Build
dune build

The compiler executable expects:

dune exec ./main.exe <numOfRegisters> <CheckUndefinedVars> <EnableOptimization> <program.miniimp> <OutputPath>

Example:

dune exec ./main.exe 5 true true ./tests/sum.miniimp ./Output/output.minirisc
  • numOfRegisters – maximum number of physical registers.
  • CheckUndefinedVarstrue / false to enable defined‑variables analysis.
  • EnableOptimizationtrue / false to enable register‑merging optimization.
  • program.miniimp – path to the MiniImp source.
  • OutputPath – path where the final .minirisc program will be written.

🎓 Project Context

This project was developed by Shahd Amer as part of the Languages, Compilers and Interpreters course in the Computer Science (Software Program) at the University of Pisa.

It showcases:

  • parser and interpreter construction in OCaml,
  • CFG‑based compilation,
  • classic static analyses (liveness, defined‑variables),
  • register allocation and spilling,
  • and the translation from a structured imperative language to a low‑level RISC‑style IR.

Feel free to explore the tests and source code if you are interested in language implementation and compiler design.