Skip to content

Commit 79ff4c1

Browse files
authored
Merge pull request #65 from healeycodes/forth-fixes-1
better code snippets for forth post
2 parents e333384 + 7b8d577 commit 79ff4c1

File tree

2 files changed

+90
-78
lines changed

2 files changed

+90
-78
lines changed

components/visuals/forth/components.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -789,7 +789,7 @@ function renderCompiler(highlightRange: { start: number, end: number }, tokens:
789789
const valuePart = `\u00A0${tokenStr}`;
790790
const prefixPart = `(${prefix})`;
791791
const totalBeforePrefix = indexPart.length + valuePart.length;
792-
const paddingNeeded = Math.max(0, 13 - totalBeforePrefix);
792+
const paddingNeeded = Math.max(0, 14 - totalBeforePrefix);
793793
const padding = '\u00A0'.repeat(paddingNeeded);
794794

795795
return (

posts/compiling-a-forth.md

Lines changed: 89 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -140,9 +140,9 @@ With our list of tokens, we're ready to start generating bytecode for the VM.
140140

141141
## Generating Bytecode
142142

143-
Usually, in a compiler, the step after tokenization is _parsing_ where an abstract syntax tree is built. However, the feature set of my Forth is so small, that I decided to generate bytecode directly from the list of tokens.
143+
Usually, in a compiler, the step after tokenization is called _parsing_ where an abstract syntax tree is built. However, the feature set of my Forth is so small, that I decided to generate bytecode directly from the list of tokens.
144144

145-
_After_ bytecode generation, my VM needs two things:
145+
After bytecode generation, my VM needs two things:
146146

147147
- A list of operations for the VM's instruction pointer to navigate
148148
- The number of variables that the program refers to
@@ -194,95 +194,107 @@ The bytecode generation step scans through the list of tokens and, as it process
194194
Identifier tokens are either variable references, or words (function calls).
195195
196196
```tsx
197-
let index = 0;
198-
while (index < tokens.length) {
199-
const token = tokens[index];
200-
201-
if (token.type === "identifier") {
202-
if (token.value === "VARIABLE") {
203-
const nextToken = tokens[index + 1];
197+
function compile(tokens: Token[]) {
204198

205-
// Store a binding of variable name to memory address
206-
variableTable[nextToken.value] = Object.keys(variableTable).length;
207-
index += 2;
208-
continue;
209-
}
199+
// Bytecode that runs in the VM
200+
const bytecode: Bytecode[] = [];
201+
202+
// Word -> bytecode offsets (for calls)
203+
const wordTable: { [key: string]: number } = {};
204+
205+
// Variable -> memory address
206+
const variableTable: { [key: string]: number } = {};
207+
208+
// ..
209+
210+
let index = 0;
211+
while (index < tokens.length) {
212+
const token = tokens[index];
213+
214+
if (token.type === "identifier") {
215+
if (token.value === "VARIABLE") {
216+
const nextToken = tokens[index + 1];
210217

211-
// If the variable has been declared as a word like `: FIB10`
212-
// then we have previously stored the bytecode offset which we
213-
// will set the instruction pointer to at runtime
214-
if (wordTable[token.value] !== undefined) {
215-
bytecode.push({ op: "call", address: wordTable[token.value] });
218+
// Store a binding of variable name to memory address
219+
variableTable[nextToken.value] = Object.keys(variableTable).length;
220+
index += 2;
221+
continue;
222+
}
223+
224+
// If the variable has been declared as a word like `: FIB10`
225+
// then we have previously stored the bytecode offset which we
226+
// will set the instruction pointer to at runtime
227+
if (wordTable[token.value] !== undefined) {
228+
bytecode.push({ op: "call", address: wordTable[token.value] });
229+
index++;
230+
continue;
231+
}
232+
233+
// If it's not a variable declaration, or a word, then we
234+
// look up the memory address
235+
bytecode.push({ op: "lit", value: variableTable[token.value] });
216236
index++;
217237
continue;
218238
}
219239

220-
// If it's not a variable declaration, or a word, then we
221-
// look up the memory address
222-
bytecode.push({ op: "lit", value: variableTable[token.value] });
223-
index++;
224-
continue;
225-
}
226-
227-
// ..
228-
}
240+
// ..
229241
```
230242
231243
Setting up the `DO`/`LOOP` bytecode generation was the trickiest part of this project. It's a minefield of possible off-by-one errors. It's also not easy to read and understand but I've chosen to put it here anyway because even just glancing over it should help you understand how the loop variables (limit, iterator) and instruction pointer jumps are combined to execute loops in Forth.
232244
233245
```tsx
234-
// ..
246+
// .. still inside compile()
235247

236-
if (token.type === "do") {
237-
index++;
238-
239-
// Expect: DS has [limit, start] (start is top)
240-
// Move both to RS: start then limit (RS top becomes limit)
241-
bytecode.push({ op: "rs_push" }) // start -> RS
242-
bytecode.push({ op: "rs_push" }) // limit -> RS
243-
244-
// Mark first instruction of loop body
245-
loopStart.push(bytecode.length);
246-
continue;
247-
}
248-
249-
if (token.type === "loop") {
248+
if (token.type === "do") {
249+
index++;
250+
251+
// Expect: DS has [limit, start] (start is top)
252+
// Move both to RS: start then limit (RS top becomes limit)
253+
bytecode.push({ op: "rs_push" }) // start -> RS
254+
bytecode.push({ op: "rs_push" }) // limit -> RS
255+
256+
// Mark first instruction of loop body
257+
loopStart.push(bytecode.length);
258+
continue;
259+
}
260+
261+
if (token.type === "loop") {
250262

251-
// Pop limit and i from RS (RS top is limit)
252-
bytecode.push({ op: "rs_pop" }) // limit -> DS
253-
bytecode.push({ op: "rs_pop" }) // i -> DS
254-
255-
// Increment i
256-
bytecode.push({ op: "lit", value: 1 })
257-
bytecode.push({ op: "add" }) // i on DS
258-
259-
// Duplicate i and limit for compare and possible restore
260-
bytecode.push({ op: "dup2" })
261-
bytecode.push({ op: "eq" }) // eq flag on DS
262-
263-
const loopStartAddress = loopStart.pop(); // first instr of loop body
264-
265-
// Branch: continue when not equal (eq==0), exit when equal
266-
const continueAddress = bytecode.length + 4; // skip equal-path (2 drops + jmp)
267-
bytecode.push({ op: "jz", address: continueAddress })
268-
269-
// Equal path (fallthrough): cleanup and exit
270-
bytecode.push({ op: "drop" }) // drop i
271-
bytecode.push({ op: "drop" }) // drop limit
272-
const afterBlockAddress = bytecode.length + 1 /* jmp */ + 3 /* continue block */;
273-
bytecode.push({ op: "jmp", address: afterBlockAddress })
274-
275-
// Continue path:
276-
// address == continueAddress
277-
bytecode.push({ op: "rs_push" }) // i -> RS (top)
278-
bytecode.push({ op: "rs_push" }) // limit -> RS
279-
bytecode.push({ op: "jmp", address: loopStartAddress })
280-
281-
index++;
282-
continue;
283-
}
263+
// Pop limit and i from RS (RS top is limit)
264+
bytecode.push({ op: "rs_pop" }) // limit -> DS
265+
bytecode.push({ op: "rs_pop" }) // i -> DS
266+
267+
// Increment i
268+
bytecode.push({ op: "lit", value: 1 })
269+
bytecode.push({ op: "add" }) // i on DS
270+
271+
// Duplicate i and limit for compare and possible restore
272+
bytecode.push({ op: "dup2" })
273+
bytecode.push({ op: "eq" }) // eq flag on DS
274+
275+
const loopStartAddress = loopStart.pop(); // first instr of loop body
276+
277+
// Branch: continue when not equal (eq==0), exit when equal
278+
const continueAddress = bytecode.length + 4; // skip equal-path (2 drops + jmp)
279+
bytecode.push({ op: "jz", address: continueAddress })
280+
281+
// Equal path (fallthrough): cleanup and exit
282+
bytecode.push({ op: "drop" }) // drop i
283+
bytecode.push({ op: "drop" }) // drop limit
284+
const afterBlockAddress = bytecode.length + 1 /* jmp */ + 3 /* continue block */;
285+
bytecode.push({ op: "jmp", address: afterBlockAddress })
286+
287+
// Continue path:
288+
// address == continueAddress
289+
bytecode.push({ op: "rs_push" }) // i -> RS (top)
290+
bytecode.push({ op: "rs_push" }) // limit -> RS
291+
bytecode.push({ op: "jmp", address: loopStartAddress })
292+
293+
index++;
294+
continue;
295+
}
284296

285-
// ..
297+
// .. trimmed other tokens, see source
286298
```
287299
288300
The rest of the token branches are more straightforward. Tokens like dot, store, load, and print all map directly to bytecode operations.

0 commit comments

Comments
 (0)