|
| 1 | +--- |
| 2 | +title: "Lisp to JavaScript Compiler" |
| 3 | +date: "2024-05-26" |
| 4 | +tags: ["rust"] |
| 5 | +description: "Transpiling Lisp to JavaScript using Rust." |
| 6 | +--- |
| 7 | + |
| 8 | +I wrote a compiler that takes Lisp code and turns it into JavaScript. The compiler is a [~280 line Rust program](https://github.com/healeycodes/lisp-to-js) that I wrote over a few nights. When I started this project, I thought I understood how basic Lisp worked (I didn't) but I do now. |
| 9 | + |
| 10 | +The first step of this project involved choosing a Lisp to implement. There are many Lisps. I'm a big fan of Mary Rose Cook's [Little Lisp](https://maryrosecook.com/blog/post/little-lisp-interpreter) blog post so I decided to implement Little Lisp, plus or minus a few forms. |
| 11 | + |
| 12 | +In Lisp terminology, a "form" is a fundamental concept referring to any syntactic construct that can be evaluated to produce a result. |
| 13 | + |
| 14 | +My Lisp has these forms. |
| 15 | + |
| 16 | +```lisp |
| 17 | +; atoms |
| 18 | +1 ; f64 numbers |
| 19 | +a ; symbols |
| 20 | +
|
| 21 | +; arithmetic expressions |
| 22 | +(+ 1 2) ; 3 |
| 23 | +(- 1 2) ; -1 |
| 24 | +
|
| 25 | +; control flow expressions |
| 26 | +(< 1 2) ; true |
| 27 | +(> 1 2) ; false |
| 28 | +(if (< 1 2) (+ 10 10) (+ 10 5)) ; 20 |
| 29 | +
|
| 30 | +; lambda expressions |
| 31 | +(lambda (x) (+ x x)) ; function that doubles |
| 32 | +
|
| 33 | +; variable definition |
| 34 | +(let ((a 1)) (print a)) ; prints 1 |
| 35 | +(let ((double (lambda (x) (+ x x))) (double 2))) ; 4 |
| 36 | +``` |
| 37 | + |
| 38 | +These are enough forms to write small programs that can do meaningful calculations. Like finding the Nth Fibonacci number, as we'll see later. |
| 39 | + |
| 40 | +## Parsing |
| 41 | + |
| 42 | +To transform a Lisp program into a JavaScript program, you need to convert it to an intermediate representation. I used an Abstract Syntax Tree (AST) for this purpose. |
| 43 | + |
| 44 | +During the compile step, I walk over this tree and output each node as JavaScript. |
| 45 | + |
| 46 | +The AST that's created for `(let ((double (lambda (x) (+ x x))) (double 2)))` looks something like this (I drew this manually but would love to have my compiler output this!): |
| 47 | + |
| 48 | +```text |
| 49 | +(let |
| 50 | + ├── Bindings |
| 51 | + │ └── (double |
| 52 | + │ ├── Variable: double |
| 53 | + │ └── Expression |
| 54 | + │ └── (lambda |
| 55 | + │ ├── Parameters |
| 56 | + │ │ └── (x) |
| 57 | + │ └── Body |
| 58 | + │ └── (+ |
| 59 | + │ ├── Expression: x |
| 60 | + │ └── Expression: x |
| 61 | + │ ) |
| 62 | + │ ) |
| 63 | + │ ) |
| 64 | + └── Body |
| 65 | + └── (double 2) |
| 66 | + ├── Function: double |
| 67 | + └── Expression: 2 |
| 68 | +) |
| 69 | +``` |
| 70 | + |
| 71 | +I used the [pom](https://github.com/J-F-Liu/pom) library to define parser combinators that consume Lisp syntax. |
| 72 | + |
| 73 | +I'll quote a section from the [pom docs](https://github.com/J-F-Liu/pom/blob/master/doc/article.md) that helped all this click for me: |
| 74 | + |
| 75 | +> A *parser* is a function which takes a *string* (a series of *symbols*) as input, and returns matching result as *output*. |
| 76 | +> |
| 77 | +> |
| 78 | +> A *combinator* is a higher-order function (a "functional") which takes zero or more functions (each of the same type) as input and returns a new function of the same type as output. |
| 79 | +> |
| 80 | +> A *parser combinator* is a higher-order function which takes parsers as input and returns a new parser as output. |
| 81 | +
|
| 82 | +I previously used this library to write the parser for [a boolean expression engine](https://healeycodes.com/porting-boolrule-to-rust) and had a pretty good time with it. Like most of my Rust experience, I spent more time fixing compile errors — rather than running, testing, and debugging. When it comes to parsers, I find fixing compile errors to be a tighter, and more productive loop, than running-and-fixing code from a more dynamic language. |
| 83 | + |
| 84 | +In my Lisp, I define a number as: |
| 85 | + |
| 86 | +1. Starts with `1-9` |
| 87 | +2. Maybe followed by any amount of `0-9` characters |
| 88 | +3. Or just a single `0` |
| 89 | + |
| 90 | +These rules are encoded in a parser combinator function. |
| 91 | + |
| 92 | +```rust |
| 93 | +fn number<'a>() -> Parser<'a, u8, f64> { |
| 94 | + let number = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) |
| 95 | + | sym(b'0'); |
| 96 | + number |
| 97 | + .collect() |
| 98 | + .convert(str::from_utf8) |
| 99 | + .convert(f64::from_str) |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +This number parser is combined with other parsers to find the atoms in my Lisp programs. |
| 104 | + |
| 105 | +```rust |
| 106 | +fn atom<'a>() -> Parser<'a, u8, Atom> { |
| 107 | + // Starts with optional space characters |
| 108 | + space() * |
| 109 | + |
| 110 | + // A number, mapped into my Atom enum |
| 111 | + (number().map(|n| Atom::Number(n)) |
| 112 | + |
| 113 | + // Or a symbol |
| 114 | + | symbol().map(|s| Atom::Symbol(s))) |
| 115 | + |
| 116 | + // Followed by optional space characters |
| 117 | + - space() |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +The resulting parser turns source code into the structs and enums that make up the AST. These structures are recursive. A function definition can define another function inside it's body (and so on). |
| 122 | + |
| 123 | +Rust only lets you define recursive data structures if you use constructs like Vecs or Boxes which have runtime checks for their memory usage. |
| 124 | + |
| 125 | +```rust |
| 126 | +enum Expression { |
| 127 | + Atom(Atom), |
| 128 | + List(Vec<Expression>), // <-- |
| 129 | + LetExpression(LetExpression), |
| 130 | + LambdaExpression(LambdaExpression), |
| 131 | + IfExpression(Box<IfExpression>), // <-- |
| 132 | + ArithmeticExpression(Box<ArithmeticExpression>), // <-- |
| 133 | +} |
| 134 | + |
| 135 | +struct LambdaExpression { |
| 136 | + parameters: Vec<String>, |
| 137 | + expressions: Vec<Expression>, // <-- |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +For me, the hardest thing about writing a Lisp parser was probably off-by-one errors with the amount of parenthesis each form uses. I also didn't have a perfect understanding of how Lisp forms combined to produce programs. Largely due to the fact that, before I started this project, I hadn't written a Lisp program before. |
| 142 | + |
| 143 | +## Generating JavaScript |
| 144 | + |
| 145 | +After the parser runs, we either have a valid AST or we've thrown an error that describes how the input failed to parse. |
| 146 | + |
| 147 | +Being that Lisp programs are made up of expressions and JavaScript has great support for expressions, I didn't have too hard of a time generating code. In terms of effort, it was 20% learning the basics of Lisp, 70% writing a parser to build the AST and data structures, and then 10% code generation. |
| 148 | + |
| 149 | +Given the simple forms I chose for my Lisp, they end up mapping fairly one-to-one with JavaScript expressions. |
| 150 | + |
| 151 | +```jsx |
| 152 | +// (+ 1 2) |
| 153 | +1 + 2 |
| 154 | + |
| 155 | +// (print (+ 1 2)) |
| 156 | +console.log(1 + 2) |
| 157 | + |
| 158 | +// (let ((double (lambda (x) (+ x x))) (double 2))) |
| 159 | +let double = x => x + x |
| 160 | +double(2) |
| 161 | +``` |
| 162 | + |
| 163 | +My compiler starts by defining `print`, and then iterating over the expressions that the parser found. |
| 164 | + |
| 165 | +```rust |
| 166 | +fn compile(program: Vec<Expression>) -> String { |
| 167 | + |
| 168 | + // Other built-ins can be manually added here |
| 169 | + let mut output = "/* lisp-to-js */ |
| 170 | +let print = console.log; |
| 171 | +
|
| 172 | +" |
| 173 | + .to_string(); |
| 174 | + |
| 175 | + program.into_iter().for_each(|expression| { |
| 176 | + |
| 177 | + // I found it easier to write to a parent variable |
| 178 | + // which didn't feel very Rust-like but it worked for me! |
| 179 | + output.push_str(&compile_expression(expression)); |
| 180 | + }); |
| 181 | + |
| 182 | + output |
| 183 | +} |
| 184 | +``` |
| 185 | + |
| 186 | +The function that handles expressions, `compile_expression`, is pretty much just a long match expression. When there are sub-expressions (very common) it recursively calls itself, continuously building an output string of JavaScript. |
| 187 | + |
| 188 | +The code generation logic was a lot of fun to write. I felt much more at home with JavaScript (compared to Lisp) and it was very much a dessert compared to battling types over in parse-land. |
| 189 | + |
| 190 | +I'll show a few of my favorite snippets here. |
| 191 | + |
| 192 | +Like supporting less-than expressions: |
| 193 | + |
| 194 | +```rust |
| 195 | +// input: (< 1 2 3) |
| 196 | +// output: 1 < 2 && 2 < 3 |
| 197 | + |
| 198 | +Op::LessThan => ret.push_str( |
| 199 | + &compiled_expressions |
| 200 | + .windows(2) // How cool is this std lib function!? |
| 201 | + .into_iter() |
| 202 | + .map(|expressions| expressions.join(" < ")) |
| 203 | + .collect::<Vec<String>>() |
| 204 | + .join(" && "), |
| 205 | +), |
| 206 | +``` |
| 207 | + |
| 208 | +And here's what I mean about a one-to-one mapping of structures: |
| 209 | + |
| 210 | +```rust |
| 211 | +// input: (if (ex1) (ex2) (ex3)) |
| 212 | +// output: ex1 ? ex2 : ex3 |
| 213 | + |
| 214 | +Expression::IfExpression(if_expression) => ret.push_str(&format!( |
| 215 | + "{} ? {} : {}\n", |
| 216 | + compile_expression(if_expression.check), |
| 217 | + compile_expression(if_expression.r#true), |
| 218 | + compile_expression(if_expression.r#false) |
| 219 | +)), |
| 220 | +``` |
| 221 | + |
| 222 | +The same thing goes for lambda expressions in Lisp. JavaScript has those too (anonymous functions)! |
| 223 | + |
| 224 | +```rust |
| 225 | +// input: (lambda (a) a) |
| 226 | +// output: a => a |
| 227 | + |
| 228 | +Expression::LambdaExpression(lambda_expression) => { |
| 229 | + let params = lambda_expression.parameters.join(","); |
| 230 | + let mut body = "".to_string(); |
| 231 | + |
| 232 | + for expression in lambda_expression.expressions { |
| 233 | + body.push_str(&format!("{}\n", &compile_expression(expression))); |
| 234 | + } |
| 235 | + |
| 236 | + ret.push_str(&format!(" (({}) => {})\n", params, body)); |
| 237 | +} |
| 238 | +``` |
| 239 | + |
| 240 | +Usually, the first program I write with a new interpreter or compiler is a Fibonacci function. It's a good test for a range of functionality (variable binding, boolean logic, comparison, recursion, and sometimes performance too). |
| 241 | + |
| 242 | +Here's my compiler's output for a Fibonacci function with all the odd spacing and hanging commas that my compiler produces. |
| 243 | + |
| 244 | +```jsx |
| 245 | +/* |
| 246 | +(let ((fib (lambda (n) |
| 247 | + (if (< n 2) |
| 248 | + n |
| 249 | + (+ (fib (- n 1)) (fib (- n 2))))))) |
| 250 | +(print (fib 10))) |
| 251 | +*/ |
| 252 | + |
| 253 | +let print = console.log; |
| 254 | + |
| 255 | +let fib = ((n) => n < 2 ? n : ( fib (( n -1), )+ fib (( n -2), )) |
| 256 | + |
| 257 | +) |
| 258 | +; print ( fib (10, ), ) |
| 259 | +``` |
| 260 | + |
| 261 | +You might think at this point: hm, this kinda thing seems hard to debug at runtime … what happens if you write a Lisp program with valid syntax but that will cause an error? |
| 262 | + |
| 263 | +Well, the errors end up being quite useful! This is probably due to the fact there's only so much that can go wrong with such a limited amount of forms. |
| 264 | + |
| 265 | +```lisp |
| 266 | +(hm 1 2) ; ReferenceError: hm is not defined |
| 267 | +(+ two 2) ; ReferenceError: two is not defined |
| 268 | +
|
| 269 | +(let ((a a)) ()) |
| 270 | +; let a = a ; |
| 271 | +; ^ |
| 272 | +; |
| 273 | +; ReferenceError: Cannot access 'a' before initialization |
| 274 | +``` |
| 275 | + |
| 276 | +## Bonus: Compiling to Native |
| 277 | + |
| 278 | +I've been following the development of [Porffor](https://github.com/CanadaHonk/porffor) — an ahead-of-time optimizing JavaScript engine. While it's limited in the kind of JavaScript that it supports at the moment, the subset of JavaScript that *my* compiler produces is also limited. |
| 279 | + |
| 280 | +Porffor can take my compiler's output (JavaScript) and compile it into a C program! |
| 281 | + |
| 282 | +```bash |
| 283 | +$ npm i -g porffor@latest |
| 284 | + |
| 285 | +# generate a C file from my compiler's output |
| 286 | +$ porf c fib10.js out.c |
| 287 | + |
| 288 | +# compile it for native |
| 289 | +$ gcc out.c -o out |
| 290 | + |
| 291 | +$ ./out |
| 292 | +55 # it works! |
| 293 | +``` |
0 commit comments