Skip to content

Commit 230381a

Browse files
committed
Update README and add function calls lowering documentation
1 parent a611230 commit 230381a

File tree

2 files changed

+304
-5
lines changed

2 files changed

+304
-5
lines changed

β€Žcore/wasm-codegen/README.mdβ€Ž

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,22 @@ Typed AST (TypedContext)
2121
### Compilation Phases
2222

2323
1. **AST Traversal** - Walk typed AST and visit function definitions
24-
2. **Local Pre-scan** - Walk the entire function body once to collect all `let` and `const`
24+
2. **Function name pre-scan** - Build `func_name_to_idx` map from function names to WASM
25+
function section indices before the main compilation pass. This enables forward references
26+
β€” a caller defined before its callee in source can still emit a valid `call` instruction.
27+
See [docs/function-calls-lowering.md](docs/function-calls-lowering.md).
28+
3. **Local Pre-scan** - Walk the entire function body once to collect all `let` and `const`
2529
declarations and assign them sequential WASM local indices before any instructions are
2630
emitted. This step is mandatory because the WebAssembly binary format requires all local
2731
declarations to appear at the very start of a function body, before the instruction
2832
sequence. See [docs/local-variables-lowering.md](docs/local-variables-lowering.md) for a
2933
detailed explanation.
30-
3. **Instruction Emission** - Lower functions, statements, and expressions to WASM
34+
4. **Instruction Emission** - Lower functions, statements, and expressions to WASM
3135
instructions. `let` definitions are lowered via a push instruction followed by
3236
`local.set`; `const` definitions use the same path. Supported initializer expression
33-
kinds are literals, identifiers, and uzumaki (`@`) expressions.
34-
4. **Module Assembly** - Assemble TypeSection, FunctionSection, ExportSection, CodeSection,
37+
kinds are literals, identifiers, uzumaki (`@`) expressions, and function calls. Function
38+
calls push arguments in positional order and emit a `call <func_idx>` instruction.
39+
5. **Module Assembly** - Assemble TypeSection, FunctionSection, ExportSection, CodeSection,
3540
and NameSection into a complete WASM binary
3641

3742
## Non-Deterministic Extensions
@@ -180,7 +185,7 @@ The `codegen` function:
180185

181186
- **Multi-file support** - Only single-file compilation is fully implemented
182187
- **Top-level constructs** - Only function definitions are compiled; type definitions, constants at module level, and other top-level items are not yet supported
183-
- **Expression types** - Limited support for complex expressions (binary operations, function calls, structs, arrays)
188+
- **Expression types** - Limited support for complex expressions (binary operations, structs, arrays). Plain identifier-based function calls are supported; method calls (`obj.method()`), associated function calls (`Type::func()`), and higher-order function calls are not yet implemented.
184189
- **Type system** - Generic types, custom types, and function types are not yet fully implemented
185190

186191
## Documentation
@@ -190,11 +195,15 @@ Detailed design documents live in `docs/`:
190195
- [docs/local-variables-lowering.md](docs/local-variables-lowering.md) - The two-pass
191196
approach for lowering `let`/`const` locals, supported initializer kinds, and the
192197
`lower_literal` type-dispatch logic for sub-i32 types.
198+
- [docs/function-calls-lowering.md](docs/function-calls-lowering.md) - Forward-reference
199+
pre-scan, parameter index interlock with locals, call lowering pipeline, drop emission
200+
rules, and known limitations.
193201

194202
## Module Organization
195203

196204
- `lib.rs` - Public API and AST traversal
197205
- `compiler.rs` - WASM instruction emission and module assembly
206+
- `errors.rs` - `CodegenError` enum for function call lowering failures
198207
- `output.rs` - `CodegenOutput` containing WASM bytes and metadata
199208
- `target.rs` - Compilation target definitions (`Wasm32`, `Soroban`)
200209

@@ -218,6 +227,10 @@ Test data includes:
218227
identifier initializers (validated against `inf_wasmparser` and compared byte-for-byte)
219228
- `local_variables_exec.inf` - Wasmtime execution tests that verify the correct WASM value
220229
is returned for each `let` binding form
230+
- `fn_params.inf` - Functions with typed parameters (i32, i64, bool, multi-param); verifies
231+
parameter-to-local-index mapping and WASM type signatures
232+
- `fn_calls.inf` - Function call scenarios including no-arg calls, arg passing, forward
233+
references, and `let`-from-call; validated and executed via wasmtime
221234

222235
## Related Resources
223236

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
# Function Calls Lowering
2+
3+
This document describes how Inference function calls are compiled to WebAssembly `call`
4+
instructions, covering the forward-reference pre-scan, the interlock between parameter
5+
indices and body-local indices, the call lowering pipeline, drop emission rules, and known
6+
limitations.
7+
8+
## Prerequisites
9+
10+
Readers should be familiar with:
11+
12+
- The WebAssembly binary format β€” specifically function indices, type signatures, and the
13+
`call` instruction (see
14+
[WebAssembly spec, section 5.4.1](https://webassembly.github.io/spec/core/binary/instructions.html))
15+
- Inference function syntax (see
16+
[Inference Language Specification](https://github.com/Inferara/inference-language-spec))
17+
- The overall compilation pipeline described in `core/wasm-codegen/README.md`
18+
- Local variable lowering described in `docs/local-variables-lowering.md`
19+
20+
## Why Forward References Require a Pre-Scan
21+
22+
In WebAssembly, the `call` instruction takes a function index β€” an integer that identifies
23+
the callee by its position in the WASM function section. The function section is ordered by
24+
definition order in source.
25+
26+
Inference allows forward references: a caller can appear before its callee in the source
27+
file. A single-pass compiler that emits `call` instructions as it encounters calls would
28+
not yet know the index of a callee defined later.
29+
30+
The compiler solves this with a dedicated pre-scan in `lib.rs`:
31+
32+
```rust
33+
fn traverse_t_ast_with_compiler(typed_context: &TypedContext, compiler: &mut Compiler) {
34+
for source_file in &typed_context.source_files() {
35+
let func_defs = source_file.function_definitions();
36+
// Pre-scan: build function name-to-index map so that forward references
37+
// (callee defined after caller in source) resolve correctly at call sites.
38+
compiler.build_func_name_to_idx(&func_defs);
39+
for func_def in func_defs {
40+
compiler.visit_function_definition(&func_def, typed_context);
41+
}
42+
}
43+
}
44+
```
45+
46+
`build_func_name_to_idx` assigns each function its WASM index β€” the same ordering used
47+
during `visit_function_definition`. This guarantees that when `lower_function_call` looks
48+
up a callee name, it finds the correct index regardless of whether the callee was already
49+
compiled.
50+
51+
### Diagram
52+
53+
```text
54+
traverse_t_ast_with_compiler
55+
|
56+
+---> build_func_name_to_idx(func_defs)
57+
| |
58+
| | Enumerate funcs in source order
59+
| | func_name_to_idx["foo"] = 0, ["bar"] = 1, ...
60+
| v
61+
| func_name_to_idx populated for ALL functions
62+
|
63+
+---> visit_function_definition(func_defs[0]) // "foo"
64+
+---> visit_function_definition(func_defs[1]) // "bar"
65+
+---> ...
66+
|
67+
| lower_function_call("bar") can look up index 1
68+
| even if called from "foo" (index 0, defined first)
69+
```
70+
71+
## How Parameter Indices Interlock with Local Indices
72+
73+
WebAssembly represents function parameters as the first locals in a function body. A
74+
function with signature `(i32, i64) -> i32` has:
75+
76+
- Local 0: first `i32` parameter
77+
- Local 1: `i64` parameter
78+
- Locals 2, 3, ...: additional locals declared in the body
79+
80+
The compiler implements this by processing parameters first, before `pre_scan_locals`:
81+
82+
```text
83+
visit_function_definition
84+
|
85+
+---> Process parameters: populate locals_map[param_name] = (0..param_count, vt)
86+
| local_idx starts at 0 and increments for each param
87+
|
88+
+---> param_count = local_idx (save watermark)
89+
|
90+
+---> pre_scan_locals(body, locals_map, local_idx)
91+
| local_idx continues from param_count (no reset)
92+
| body locals get indices param_count, param_count+1, ...
93+
|
94+
+---> Function::new(local_declarations)
95+
only declares locals with index >= param_count
96+
(params are implicit from the type signature)
97+
```
98+
99+
This means that within `locals_map`, parameters and body locals share the same namespace
100+
and can be accessed uniformly via `local.get <index>` β€” the WASM VM handles the
101+
distinction transparently.
102+
103+
### Example
104+
105+
```inference
106+
fn first_of_two(a: i32, b: i32) -> i32 {
107+
let tmp: i32 = a;
108+
return tmp;
109+
}
110+
```
111+
112+
`locals_map` after pre-scan:
113+
114+
- `"a"` β†’ (0, I32)
115+
- `"b"` β†’ (1, I32)
116+
- `"tmp"` β†’ (2, I32)
117+
118+
`Function::new` receives only `[(1, I32)]` for `tmp` (index 2, but declared as count=1 of
119+
that type). `a` and `b` are implicit from the type signature.
120+
121+
Generated body:
122+
123+
```text
124+
local.get 0 ; a
125+
local.set 2 ; tmp = a
126+
local.get 2 ; tmp
127+
return
128+
```
129+
130+
## The Call Lowering Pipeline
131+
132+
`lower_function_call` in `compiler.rs` handles the three steps needed to emit a `call`:
133+
134+
```text
135+
lower_function_call(fce, ctx, func, locals_map)
136+
|
137+
1. Check callee kind: only Expression::Identifier accepted
138+
| Other kinds β†’ Err(CodegenError::UnsupportedCalleeKind)
139+
|
140+
2. Lower arguments in positional order
141+
| for (label, expr) in fce.arguments:
142+
| lower_expression(expr, ...) // pushes arg onto WASM stack
143+
| labels are ignored (WASM is purely positional)
144+
|
145+
3. Resolve callee index and emit call
146+
func_idx = func_name_to_idx[fce.name()]
147+
func.instruction(&Instruction::Call(func_idx))
148+
```
149+
150+
Argument labels (if present in source) are discarded at the WASM level because WebAssembly
151+
has no concept of named arguments. The type-checker validates label correctness and
152+
argument count before codegen runs.
153+
154+
### Code Path
155+
156+
```rust
157+
fn lower_function_call(&self, fce, ctx, func, locals_map) -> Result<(), CodegenError> {
158+
let Expression::Identifier(_) = &fce.function else {
159+
return Err(CodegenError::UnsupportedCalleeKind);
160+
};
161+
cov_mark::hit!(wasm_codegen_emit_function_call);
162+
if let Some(arguments) = &fce.arguments {
163+
for (_label, expr_ref) in arguments {
164+
self.lower_expression(&expr_ref.borrow(), ctx, func, locals_map);
165+
}
166+
}
167+
let func_name = fce.name();
168+
let func_idx = self.func_name_to_idx.get(&func_name).copied()
169+
.ok_or(CodegenError::UnknownFunction(func_name))?;
170+
func.instruction(&Instruction::Call(func_idx));
171+
Ok(())
172+
}
173+
```
174+
175+
## Drop Emission Rules
176+
177+
WebAssembly is a stack machine. A value-returning function call leaves its return value on
178+
the operand stack. When the call appears as a standalone expression statement (rather than
179+
being consumed by `local.set` or another expression), that value must be explicitly dropped
180+
to keep the stack balanced.
181+
182+
The `Statement::Expression` arm in `lower_statement` determines whether to emit `drop`
183+
after evaluating an expression:
184+
185+
```rust
186+
Statement::Expression(expression) => {
187+
self.lower_expression(&expression, ctx, func, locals_map);
188+
let expr_produces_value = ctx.get_node_typeinfo(expression.id())
189+
.is_some_and(|ti| !matches!(ti.kind, TypeInfoKind::Unit));
190+
if expr_produces_value {
191+
let is_block_result = statements_iterator.peek().is_none()
192+
&& parent_blocks_stack.last()
193+
.is_some_and(|b| b.is_non_det() && !b.is_void());
194+
if !is_block_result {
195+
func.instruction(&Instruction::Drop);
196+
}
197+
}
198+
}
199+
```
200+
201+
### Decision Table
202+
203+
| Call return type | Position in block | Drop emitted? | Reason |
204+
|-----------------|-------------------|---------------|--------|
205+
| `unit` (void) | anywhere | No | No value on stack |
206+
| non-void | middle of block | Yes | Value not consumed; stack must be balanced |
207+
| non-void | last stmt of non-det block | No | Value is the block's result, consumed by enclosing context |
208+
| non-void | RHS of `let` | No | `local.set` consumes the value (different code path) |
209+
| non-void | RHS of `return` | No | `return` consumes the value |
210+
211+
## Supported vs Unsupported Callee Kinds
212+
213+
Only plain identifier callees are currently supported:
214+
215+
```inference
216+
// Supported: plain identifier
217+
let x = foo(1, 2);
218+
return bar();
219+
220+
// Not yet supported: method call
221+
obj.method(); // β†’ CodegenError::UnsupportedCalleeKind β†’ todo!()
222+
223+
// Not yet supported: associated function
224+
MyType::func(); // β†’ CodegenError::UnsupportedCalleeKind β†’ todo!()
225+
226+
// Not yet supported: higher-order / function pointer
227+
let f = foo;
228+
f(1); // β†’ CodegenError::UnsupportedCalleeKind β†’ todo!()
229+
```
230+
231+
The `CodegenError` enum in `errors.rs` encodes these distinctions:
232+
233+
```rust
234+
pub(crate) enum CodegenError {
235+
UnsupportedCalleeKind, // β†’ todo!() (planned future work)
236+
UnknownFunction(String), // β†’ panic!() (type-checker inconsistency)
237+
}
238+
```
239+
240+
## Known Limitations
241+
242+
### Recursion
243+
244+
Direct or indirect recursion is explicitly forbidden in Inference (Power of 10, Rule 1).
245+
The analysis pass that detects recursive call graphs has not yet been implemented. At
246+
codegen time, a recursive call is not specially detected β€” it would generate a valid `call`
247+
instruction. The analysis pass must be added to reject recursive programs before they reach
248+
codegen.
249+
250+
### Uzumaki Arguments
251+
252+
Passing `@` (uzumaki) as a function argument (e.g., `foo(@)`) triggers a type-checker
253+
panic because the type-checker does not yet propagate the expected parameter type onto the
254+
`@` expression. This is a gap in the type-checker, not in codegen.
255+
256+
### Method and Associated Function Calls
257+
258+
`obj.method()` and `Type::assoc()` call forms require member access resolution and
259+
dispatch logic not yet implemented. They produce `todo!()` via
260+
`CodegenError::UnsupportedCalleeKind`.
261+
262+
### Multi-File Calls
263+
264+
`build_func_name_to_idx` is invoked per source file. Cross-file function calls cannot be
265+
resolved until multi-file compilation is implemented (currently `todo!()` in `codegen()`).
266+
267+
## Coverage Marks
268+
269+
| Mark | Count | Meaning |
270+
|------|-------|---------|
271+
| `wasm_codegen_emit_function_params` | 7 | 7 parameters across all functions in `fn_params.inf` |
272+
| `wasm_codegen_emit_function_call` | 5 | 5 call sites in `fn_calls.inf` |
273+
274+
The `fn_params_test` verifies `wasm_codegen_emit_function_params` fires exactly 7 times
275+
(matching `fn_params.inf`: 1+1+1+2+2 params). The `fn_calls_test` verifies
276+
`wasm_codegen_emit_function_call` fires exactly 5 times.
277+
278+
## Related Files
279+
280+
- `core/wasm-codegen/src/compiler.rs` β€” `build_func_name_to_idx`, `visit_function_definition`, `lower_function_call`
281+
- `core/wasm-codegen/src/errors.rs` β€” `CodegenError` enum
282+
- `core/wasm-codegen/src/lib.rs` β€” `traverse_t_ast_with_compiler` (where pre-scan is called)
283+
- `core/wasm-codegen/README.md` β€” Crate-level overview and compilation phases
284+
- `core/wasm-codegen/docs/local-variables-lowering.md` β€” Local variable lowering (prerequisite)
285+
- `tests/test_data/codegen/wasm/base/fn_params/fn_params.inf` β€” Parameter test fixture
286+
- `tests/test_data/codegen/wasm/base/fn_calls/fn_calls.inf` β€” Function call test fixture

0 commit comments

Comments
Β (0)