Skip to content

zlw/monkey-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Monkey Language but in OCaml 🐡 🐫

Installation

First you need to install OCaml, opam and dune:

Then you can clone this repository and run the following commands:

git clone git@github.com:zlw/marmoset.git

And then install the dependencies:

make install

Build

Dev

faster compilation, slower runtime performance

make build

Release

slower compilation, faster runtime performance

make release

Run tests

make unit

Progress

  • Lexer
  • Parser
  • Evaluator
  • Compiler

Features

Feature Interpreter Compiler
Bindings βœ… βœ…
Conditionals βœ… βœ…
Strings βœ… βœ…
Integers βœ… βœ…
Arithmetic +-/* βœ… βœ…
Arrays βœ… βœ…
Hashes βœ… βœ…
Functions βœ… βœ…
First class functions βœ… βœ…
Higher order functions βœ… βœ…
Closures βœ… βœ…
Recursion βœ… βœ…
Built-In Functions βœ… βœ…
Macros ❌ ❌

Additional Features (not present in the original implementation)

Feature Interpreter Compiler
Floats βœ… βœ…
Float Arithmetic βœ… βœ…
String Indexing βœ… βœ…
String Concatenation βœ… βœ…
String Equality βœ… βœ…
Negative Indexing βœ… βœ…
Comments βœ… βœ…

Compiler & VM Implementation Notes

The compiler and VM follow "Writing a Compiler in Go" by Thorsten Ball, but with some OCaml-specific considerations.

Why the code looks "imperative"

While OCaml excels at functional programming, the VM implementation uses mutation in a few places for performance:

  • Compiler: Uses Dynarray (dynamic arrays) and Buffer for building bytecode, and mutable fields for tracking compilation state
  • VM: Uses mutable arrays for the stack, globals, and call frames

This is intentional. A purely functional VM with immutable data structures would allocate more and be slower. The mutation is localized and doesn't leak into the rest of the codebase.

Performance

Recursive Fibonacci benchmark (fib(35)):

Implementation Go OCaml
Tree-walking interpreter ~8.8s ~4.9s
Bytecode VM ~2.8s ~2.0s

OCaml's tree-walker is ~1.8x faster than Go's thanks to efficient pattern matching and algebraic data types. The bytecode VM is ~29% faster than Go after optimization:

  1. [@inline] hints on hot functions - The biggest win. OCaml's compiler is conservative about inlining; Go is more aggressive. Adding [@inline] to push, pop, current_frame, execute_binary_op, etc. gave ~16% speedup.

  2. Bytes.unsafe_get / Array.unsafe_get - Skip bounds checking in the VM loop. Safe because we trust our own compiler's bytecode. ~3% speedup.

  3. Obj.magic for opcode dispatch - Convert int to opcode variant without pattern matching through of_int. OCaml represents simple variants as integers internally, so this is safe. Preserves exhaustiveness checking in the main dispatch.

Why OCaml's tree-walker is so fast

OCaml's interpreter is ~1.8x faster than Go's because:

  • Pattern matching compiles to efficient jump tables
  • Algebraic data types have no runtime type assertions
  • The GC is optimized for functional allocation patterns

This means the VM has less relative speedup over the interpreter compared to Go, but in absolute terms both implementations are fast.

TODO

  • Cleanup pyramid of doom in Parser
    • Use Result.bind
    • Propagate parsing errors instead of crashing
  • Add system tests
    • test runner
    • test cases (maybe reuse some from Crafting Interpreters?)
  • Add support for negative array indexing
  • Add support for string indexing
    • Positive
    • Negative
  • Add support for Floats
    • Support Integer/Float arithmetic (currently there's Int/Int and Float/Float)

Ideas

Monkey supports closures and first class functions. It would be interesting to add some functional programming features to it:

  • immutability
  • static typing with Hindley–Milner style type inference
  • pattern matching

We those in place, it would be a JS-looking language with a ML core πŸ€”

Type Safety

Currently, out-of-bounds array access and missing hash keys return null (matching canonical Monkey). This is a footgun in dynamic languages.

When adding a static type system, consider:

  1. Option types: arr[i] returns Option<T>, forces explicit match/unwrap
  2. Dependent types: Prove bounds at compile time (e.g., Vec<T, N> where index must be < N)
  3. Refinement types: arr[i] where i : { n : Int | 0 <= n < len(arr) }
  4. Gradual typing: Allow both safe (arr.get(i) -> Option) and unsafe (arr[i] -> T) with different syntax

Interesting type system concepts to explore:

  • Hindley-Milner type inference (ML, Haskell)
  • Bidirectional type checking (modern approach, easier to implement)
  • Algebraic data types with exhaustiveness checking
  • Row polymorphism for extensible records/hashes
  • Effect systems for tracking errors, IO, etc.
  • Linear/affine types for resource management (Rust-style ownership)
  • Dependent types (Idris, Agda) - types that depend on values

Resources:

  • "Types and Programming Languages" (Pierce) - the bible
  • "Practical Foundations for Programming Languages" (Harper)
  • Bidirectional typing: https://arxiv.org/abs/1908.05839

About

Monkey Language but in OCaml 🐫

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages