You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: posts/compiling-a-forth.md
+89-77Lines changed: 89 additions & 77 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,9 +140,9 @@ With our list of tokens, we're ready to start generating bytecode for the VM.
140
140
141
141
## Generating Bytecode
142
142
143
-
Usually, in a compiler, the step after tokenization is _parsing_ where an abstract syntax tree is built. However, the feature set of my Forth is so small, that I decided to generate bytecode directly from the list of tokens.
143
+
Usually, in a compiler, the step after tokenization is called _parsing_ where an abstract syntax tree is built. However, the feature set of my Forth is so small, that I decided to generate bytecode directly from the list of tokens.
144
144
145
-
_After_ bytecode generation, my VM needs two things:
145
+
After bytecode generation, my VM needs two things:
146
146
147
147
- A list of operations for the VM's instruction pointer to navigate
148
148
- The number of variables that the program refers to
@@ -194,95 +194,107 @@ The bytecode generation step scans through the list of tokens and, as it process
194
194
Identifier tokens are either variable references, or words (function calls).
195
195
196
196
```tsx
197
-
let index =0;
198
-
while (index<tokens.length) {
199
-
const token =tokens[index];
200
-
201
-
if (token.type==="identifier") {
202
-
if (token.value==="VARIABLE") {
203
-
const nextToken =tokens[index+1];
197
+
function compile(tokens:Token[]) {
204
198
205
-
// Store a binding of variable name to memory address
Setting up the `DO`/`LOOP` bytecode generation was the trickiest part of this project. It's a minefield of possible off-by-one errors. It's also not easy to read and understand but I've chosen to put it here anyway because even just glancing over it should help you understand how the loop variables (limit, iterator) and instruction pointer jumps are combined to execute loops in Forth.
232
244
233
245
```tsx
234
-
// ..
246
+
// .. still inside compile()
235
247
236
-
if (token.type==="do") {
237
-
index++;
238
-
239
-
// Expect: DS has [limit, start] (start is top)
240
-
// Move both to RS: start then limit (RS top becomes limit)
241
-
bytecode.push({ op: "rs_push" }) // start -> RS
242
-
bytecode.push({ op: "rs_push" }) // limit -> RS
243
-
244
-
// Mark first instruction of loop body
245
-
loopStart.push(bytecode.length);
246
-
continue;
247
-
}
248
-
249
-
if (token.type==="loop") {
248
+
if (token.type==="do") {
249
+
index++;
250
+
251
+
// Expect: DS has [limit, start] (start is top)
252
+
// Move both to RS: start then limit (RS top becomes limit)
253
+
bytecode.push({ op: "rs_push" }) // start -> RS
254
+
bytecode.push({ op: "rs_push" }) // limit -> RS
255
+
256
+
// Mark first instruction of loop body
257
+
loopStart.push(bytecode.length);
258
+
continue;
259
+
}
260
+
261
+
if (token.type==="loop") {
250
262
251
-
// Pop limit and i from RS (RS top is limit)
252
-
bytecode.push({ op: "rs_pop" }) // limit -> DS
253
-
bytecode.push({ op: "rs_pop" }) // i -> DS
254
-
255
-
// Increment i
256
-
bytecode.push({ op: "lit", value: 1 })
257
-
bytecode.push({ op: "add" }) // i on DS
258
-
259
-
// Duplicate i and limit for compare and possible restore
260
-
bytecode.push({ op: "dup2" })
261
-
bytecode.push({ op: "eq" }) // eq flag on DS
262
-
263
-
const loopStartAddress =loopStart.pop(); // first instr of loop body
264
-
265
-
// Branch: continue when not equal (eq==0), exit when equal
0 commit comments