Skip to content

Commit 26f451c

Browse files
committed
add lisp compiler post
1 parent 4cdf553 commit 26f451c

File tree

3 files changed

+300
-0
lines changed

3 files changed

+300
-0
lines changed

data/posts.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ export const popularPosts = [
77

88
// Good posts/highly viewed posts (not in any specific order)
99
export const postStars = [
10+
"lisp-to-javascript-compiler",
1011
"compressing-cs2-demos",
1112
"a-custom-webassembly-compiler",
1213
"rendering-counter-strike-demos-in-the-browser",

data/projects.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ export default [
1717
desc: "A text editor for macOS. Built using the Ebitengine game engine.",
1818
to: "/making-a-text-editor-with-a-game-engine",
1919
},
20+
{
21+
name: "lisp-to-js",
22+
link: "https://github.com/healeycodes/lisp-to-js",
23+
desc: "A Lisp to JavaScript compiler written in Rust. Supports a variant of Little Lisp.",
24+
to: "/lisp-to-javascript-compiler",
25+
},
2026
{
2127
name: "jar",
2228
link: "https://github.com/healeycodes/jar",
Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
---
2+
title: "Lisp to JavaScript Compiler"
3+
date: "2024-05-26"
4+
tags: ["rust"]
5+
description: "Transpiling Lisp to JavaScript using Rust."
6+
---
7+
8+
I wrote a compiler that takes Lisp code and turns it into JavaScript. The compiler is a [~280 line Rust program](https://github.com/healeycodes/lisp-to-js) that I wrote over a few nights. When I started this project, I thought I understood how basic Lisp worked (I didn't) but I do now.
9+
10+
The first step of this project involved choosing a Lisp to implement. There are many Lisps. I'm a big fan of Mary Rose Cook's [Little Lisp](https://maryrosecook.com/blog/post/little-lisp-interpreter) blog post so I decided to implement Little Lisp, plus or minus a few forms.
11+
12+
In Lisp terminology, a "form" is a fundamental concept referring to any syntactic construct that can be evaluated to produce a result.
13+
14+
My Lisp has these forms.
15+
16+
```lisp
17+
; atoms
18+
1 ; f64 numbers
19+
a ; symbols
20+
21+
; arithmetic expressions
22+
(+ 1 2) ; 3
23+
(- 1 2) ; -1
24+
25+
; control flow expressions
26+
(< 1 2) ; true
27+
(> 1 2) ; false
28+
(if (< 1 2) (+ 10 10) (+ 10 5)) ; 20
29+
30+
; lambda expressions
31+
(lambda (x) (+ x x)) ; function that doubles
32+
33+
; variable definition
34+
(let ((a 1)) (print a)) ; prints 1
35+
(let ((double (lambda (x) (+ x x))) (double 2))) ; 4
36+
```
37+
38+
These are enough forms to write small programs that can do meaningful calculations. Like finding the Nth Fibonacci number, as we'll see later.
39+
40+
## Parsing
41+
42+
To transform a Lisp program into a JavaScript program, you need to convert it to an intermediate representation. I used an Abstract Syntax Tree (AST) for this purpose.
43+
44+
During the compile step, I walk over this tree and output each node as JavaScript.
45+
46+
The AST that's created for `(let ((double (lambda (x) (+ x x))) (double 2)))` looks something like this (I drew this manually but would love to have my compiler output this!):
47+
48+
```text
49+
(let
50+
├── Bindings
51+
│ └── (double
52+
│ ├── Variable: double
53+
│ └── Expression
54+
│ └── (lambda
55+
│ ├── Parameters
56+
│ │ └── (x)
57+
│ └── Body
58+
│ └── (+
59+
│ ├── Expression: x
60+
│ └── Expression: x
61+
│ )
62+
│ )
63+
│ )
64+
└── Body
65+
└── (double 2)
66+
├── Function: double
67+
└── Expression: 2
68+
)
69+
```
70+
71+
I used the [pom](https://github.com/J-F-Liu/pom) library to define parser combinators that consume Lisp syntax.
72+
73+
I'll quote a section from the [pom docs](https://github.com/J-F-Liu/pom/blob/master/doc/article.md) that helped all this click for me:
74+
75+
> *parser* is a function which takes a *string* (a series of *symbols*) as input, and returns matching result as *output*.
76+
>
77+
>
78+
> *combinator* is a higher-order function (a "functional") which takes zero or more functions (each of the same type) as input and returns a new function of the same type as output.
79+
>
80+
> *parser combinator* is a higher-order function which takes parsers as input and returns a new parser as output.
81+
82+
I previously used this library to write the parser for [a boolean expression engine](https://healeycodes.com/porting-boolrule-to-rust) and had a pretty good time with it. Like most of my Rust experience, I spent more time fixing compile errors — rather than running, testing, and debugging. When it comes to parsers, I find fixing compile errors to be a tighter, and more productive loop, than running-and-fixing code from a more dynamic language.
83+
84+
In my Lisp, I define a number as:
85+
86+
1. Starts with `1-9`
87+
2. Maybe followed by any amount of `0-9` characters
88+
3. Or just a single `0`
89+
90+
These rules are encoded in a parser combinator function.
91+
92+
```rust
93+
fn number<'a>() -> Parser<'a, u8, f64> {
94+
let number = one_of(b"123456789") - one_of(b"0123456789").repeat(0..)
95+
| sym(b'0');
96+
number
97+
.collect()
98+
.convert(str::from_utf8)
99+
.convert(f64::from_str)
100+
}
101+
```
102+
103+
This number parser is combined with other parsers to find the atoms in my Lisp programs.
104+
105+
```rust
106+
fn atom<'a>() -> Parser<'a, u8, Atom> {
107+
// Starts with optional space characters
108+
space() *
109+
110+
// A number, mapped into my Atom enum
111+
(number().map(|n| Atom::Number(n))
112+
113+
// Or a symbol
114+
| symbol().map(|s| Atom::Symbol(s)))
115+
116+
// Followed by optional space characters
117+
- space()
118+
}
119+
```
120+
121+
The resulting parser turns source code into the structs and enums that make up the AST. These structures are recursive. A function definition can define another function inside it's body (and so on).
122+
123+
Rust only lets you define recursive data structures if you use constructs like Vecs or Boxes which have runtime checks for their memory usage.
124+
125+
```rust
126+
enum Expression {
127+
Atom(Atom),
128+
List(Vec<Expression>), // <--
129+
LetExpression(LetExpression),
130+
LambdaExpression(LambdaExpression),
131+
IfExpression(Box<IfExpression>), // <--
132+
ArithmeticExpression(Box<ArithmeticExpression>), // <--
133+
}
134+
135+
struct LambdaExpression {
136+
parameters: Vec<String>,
137+
expressions: Vec<Expression>, // <--
138+
}
139+
```
140+
141+
For me, the hardest thing about writing a Lisp parser was probably off-by-one errors with the amount of parenthesis each form uses. I also didn't have a perfect understanding of how Lisp forms combined to produce programs. Largely due to the fact that, before I started this project, I hadn't written a Lisp program before.
142+
143+
## Generating JavaScript
144+
145+
After the parser runs, we either have a valid AST or we've thrown an error that describes how the input failed to parse.
146+
147+
Being that Lisp programs are made up of expressions and JavaScript has great support for expressions, I didn't have too hard of a time generating code. In terms of effort, it was 20% learning the basics of Lisp, 70% writing a parser to build the AST and data structures, and then 10% code generation.
148+
149+
Given the simple forms I chose for my Lisp, they end up mapping fairly one-to-one with JavaScript expressions.
150+
151+
```jsx
152+
// (+ 1 2)
153+
1 + 2
154+
155+
// (print (+ 1 2))
156+
console.log(1 + 2)
157+
158+
// (let ((double (lambda (x) (+ x x))) (double 2)))
159+
let double = x => x + x
160+
double(2)
161+
```
162+
163+
My compiler starts by defining `print`, and then iterating over the expressions that the parser found.
164+
165+
```rust
166+
fn compile(program: Vec<Expression>) -> String {
167+
168+
// Other built-ins can be manually added here
169+
let mut output = "/* lisp-to-js */
170+
let print = console.log;
171+
172+
"
173+
.to_string();
174+
175+
program.into_iter().for_each(|expression| {
176+
177+
// I found it easier to write to a parent variable
178+
// which didn't feel very Rust-like but it worked for me!
179+
output.push_str(&compile_expression(expression));
180+
});
181+
182+
output
183+
}
184+
```
185+
186+
The function that handles expressions, `compile_expression`, is pretty much just a long match expression. When there are sub-expressions (very common) it recursively calls itself, continuously building an output string of JavaScript.
187+
188+
The code generation logic was a lot of fun to write. I felt much more at home with JavaScript (compared to Lisp) and it was very much a dessert compared to battling types over in parse-land.
189+
190+
I'll show a few of my favorite snippets here.
191+
192+
Like supporting less-than expressions:
193+
194+
```rust
195+
// input: (< 1 2 3)
196+
// output: 1 < 2 && 2 < 3
197+
198+
Op::LessThan => ret.push_str(
199+
&compiled_expressions
200+
.windows(2) // How cool is this std lib function!?
201+
.into_iter()
202+
.map(|expressions| expressions.join(" < "))
203+
.collect::<Vec<String>>()
204+
.join(" && "),
205+
),
206+
```
207+
208+
And here's what I mean about a one-to-one mapping of structures:
209+
210+
```rust
211+
// input: (if (ex1) (ex2) (ex3))
212+
// output: ex1 ? ex2 : ex3
213+
214+
Expression::IfExpression(if_expression) => ret.push_str(&format!(
215+
"{} ? {} : {}\n",
216+
compile_expression(if_expression.check),
217+
compile_expression(if_expression.r#true),
218+
compile_expression(if_expression.r#false)
219+
)),
220+
```
221+
222+
The same thing goes for lambda expressions in Lisp. JavaScript has those too (anonymous functions)!
223+
224+
```rust
225+
// input: (lambda (a) a)
226+
// output: a => a
227+
228+
Expression::LambdaExpression(lambda_expression) => {
229+
let params = lambda_expression.parameters.join(",");
230+
let mut body = "".to_string();
231+
232+
for expression in lambda_expression.expressions {
233+
body.push_str(&format!("{}\n", &compile_expression(expression)));
234+
}
235+
236+
ret.push_str(&format!(" (({}) => {})\n", params, body));
237+
}
238+
```
239+
240+
Usually, the first program I write with a new interpreter or compiler is a Fibonacci function. It's a good test for a range of functionality (variable binding, boolean logic, comparison, recursion, and sometimes performance too).
241+
242+
Here's my compiler's output for a Fibonacci function with all the odd spacing and hanging commas that my compiler produces.
243+
244+
```jsx
245+
/*
246+
(let ((fib (lambda (n)
247+
(if (< n 2)
248+
n
249+
(+ (fib (- n 1)) (fib (- n 2)))))))
250+
(print (fib 10)))
251+
*/
252+
253+
let print = console.log;
254+
255+
let fib = ((n) => n < 2 ? n : ( fib (( n -1), )+ fib (( n -2), ))
256+
257+
)
258+
; print ( fib (10, ), )
259+
```
260+
261+
You might think at this point: hm, this kinda thing seems hard to debug at runtime … what happens if you write a Lisp program with valid syntax but that will cause an error?
262+
263+
Well, the errors end up being quite useful! This is probably due to the fact there's only so much that can go wrong with such a limited amount of forms.
264+
265+
```lisp
266+
(hm 1 2) ; ReferenceError: hm is not defined
267+
(+ two 2) ; ReferenceError: two is not defined
268+
269+
(let ((a a)) ())
270+
; let a = a ;
271+
; ^
272+
;
273+
; ReferenceError: Cannot access 'a' before initialization
274+
```
275+
276+
## Bonus: Compiling to Native
277+
278+
I've been following the development of [Porffor](https://github.com/CanadaHonk/porffor) — an ahead-of-time optimizing JavaScript engine. While it's limited in the kind of JavaScript that it supports at the moment, the subset of JavaScript that *my* compiler produces is also limited.
279+
280+
Porffor can take my compiler's output (JavaScript) and compile it into a C program!
281+
282+
```bash
283+
$ npm i -g porffor@latest
284+
285+
# generate a C file from my compiler's output
286+
$ porf c fib10.js out.c
287+
288+
# compile it for native
289+
$ gcc out.c -o out
290+
291+
$ ./out
292+
55 # it works!
293+
```

0 commit comments

Comments
 (0)