- Writing Interpreter in Go / Writing Compiler in Go's Monkey Language but in OCaml
- Marmoset is a small monkey, sometimes called Pug Monkey, and I do like pugs π
First you need to install OCaml, opam and dune:
Then you can clone this repository and run the following commands:
git clone git@github.com:zlw/marmoset.gitAnd then install the dependencies:
make installfaster compilation, slower runtime performance
make buildslower compilation, faster runtime performance
make releasemake unit- Lexer
- Parser
- Evaluator
- Compiler
| Feature | Interpreter | Compiler |
|---|---|---|
| Bindings | β | β |
| Conditionals | β | β |
| Strings | β | β |
| Integers | β | β |
| Arithmetic +-/* | β | β |
| Arrays | β | β |
| Hashes | β | β |
| Functions | β | β |
| First class functions | β | β |
| Higher order functions | β | β |
| Closures | β | β |
| Recursion | β | β |
| Built-In Functions | β | β |
| Macros | β | β |
| Feature | Interpreter | Compiler |
|---|---|---|
| Floats | β | β |
| Float Arithmetic | β | β |
| String Indexing | β | β |
| String Concatenation | β | β |
| String Equality | β | β |
| Negative Indexing | β | β |
| Comments | β | β |
The compiler and VM follow "Writing a Compiler in Go" by Thorsten Ball, but with some OCaml-specific considerations.
While OCaml excels at functional programming, the VM implementation uses mutation in a few places for performance:
- Compiler: Uses
Dynarray(dynamic arrays) andBufferfor building bytecode, and mutable fields for tracking compilation state - VM: Uses mutable arrays for the stack, globals, and call frames
This is intentional. A purely functional VM with immutable data structures would allocate more and be slower. The mutation is localized and doesn't leak into the rest of the codebase.
Recursive Fibonacci benchmark (fib(35)):
| Implementation | Go | OCaml |
|---|---|---|
| Tree-walking interpreter | ~8.8s | ~4.9s |
| Bytecode VM | ~2.8s | ~2.0s |
OCaml's tree-walker is ~1.8x faster than Go's thanks to efficient pattern matching and algebraic data types. The bytecode VM is ~29% faster than Go after optimization:
-
[@inline]hints on hot functions - The biggest win. OCaml's compiler is conservative about inlining; Go is more aggressive. Adding[@inline]topush,pop,current_frame,execute_binary_op, etc. gave ~16% speedup. -
Bytes.unsafe_get/Array.unsafe_get- Skip bounds checking in the VM loop. Safe because we trust our own compiler's bytecode. ~3% speedup. -
Obj.magicfor opcode dispatch - Convert int to opcode variant without pattern matching throughof_int. OCaml represents simple variants as integers internally, so this is safe. Preserves exhaustiveness checking in the main dispatch.
OCaml's interpreter is ~1.8x faster than Go's because:
- Pattern matching compiles to efficient jump tables
- Algebraic data types have no runtime type assertions
- The GC is optimized for functional allocation patterns
This means the VM has less relative speedup over the interpreter compared to Go, but in absolute terms both implementations are fast.
- Cleanup pyramid of doom in
Parser- Use
Result.bind - Propagate parsing errors instead of crashing
- Use
- Add system tests
- test runner
- test cases (maybe reuse some from Crafting Interpreters?)
- Add support for negative array indexing
- Add support for string indexing
- Positive
- Negative
- Add support for Floats
- Support Integer/Float arithmetic (currently there's Int/Int and Float/Float)
Monkey supports closures and first class functions. It would be interesting to add some functional programming features to it:
- immutability
- static typing with HindleyβMilner style type inference
- pattern matching
We those in place, it would be a JS-looking language with a ML core π€
Currently, out-of-bounds array access and missing hash keys return null (matching canonical Monkey). This is a footgun in dynamic languages.
When adding a static type system, consider:
- Option types:
arr[i]returnsOption<T>, forces explicitmatch/unwrap - Dependent types: Prove bounds at compile time (e.g.,
Vec<T, N>where index must be< N) - Refinement types:
arr[i]wherei : { n : Int | 0 <= n < len(arr) } - Gradual typing: Allow both safe (
arr.get(i) -> Option) and unsafe (arr[i] -> T) with different syntax
Interesting type system concepts to explore:
- Hindley-Milner type inference (ML, Haskell)
- Bidirectional type checking (modern approach, easier to implement)
- Algebraic data types with exhaustiveness checking
- Row polymorphism for extensible records/hashes
- Effect systems for tracking errors, IO, etc.
- Linear/affine types for resource management (Rust-style ownership)
- Dependent types (Idris, Agda) - types that depend on values
Resources:
- "Types and Programming Languages" (Pierce) - the bible
- "Practical Foundations for Programming Languages" (Harper)
- Bidirectional typing: https://arxiv.org/abs/1908.05839