Skip to content

Commit 534f441

Browse files
committed
refactor: implement self-referential type handling and update related documentation
1 parent f9bdcd9 commit 534f441

File tree

8 files changed

+318
-21
lines changed

8 files changed

+318
-21
lines changed

app/cmd/start.fer

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,4 +49,10 @@ let value := 10;
4949
// };
5050

5151
let myname := "Ferret";
52-
let byt := myname[0];
52+
let byt := myname[0];
53+
54+
type Node struct {
55+
value: i32,
56+
next: Node
57+
};
58+

app/fer.ret

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ name = "demo-app"
44
version = "0.0.2"
55

66
[build]
7-
entry = "cmd/if_test.fer"
7+
entry = "test_self_ref.fer"
88
output = "bin"
99

1010
[cache]

app/test_self_ref.fer

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// Test self-referential types
2+
type Node struct {
3+
value: i32,
4+
next: Node // Self-reference should now work
5+
};
6+
7+
type NodeB struct {
8+
data: str,
9+
ref: NodeA // Mutual reference
10+
};
11+
12+
type NodeA struct {
13+
id: i32,
14+
ref: NodeB // Mutual reference back
15+
};
16+
17+
fn main() {
18+
let message := "Self-referential types test";
19+
}

articles/self_ref_issue.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# The Chicken-Egg Problem in Compilers: Handling Self-Referential Types
2+
3+
When building a compiler, one of the most subtle and classic challenges you’ll face during type checking and semantic analysis is the **chicken-egg problem** - particularly when dealing with **self-referential types**.
4+
5+
Consider this simple definition:
6+
7+
```ferret
8+
// Example in Ferret-like syntax
9+
10+
type Node struct {
11+
data: i32,
12+
next: Node
13+
}
14+
```
15+
16+
At first glance, this looks valid - a linked list node referencing another node. But during compilation, the **type checker** cannot resolve the type `Node` when it first encounters `next: Node`, because `Node` itself hasn’t been fully defined yet. The compiler is trapped in a paradox: it must understand `Node` to define `Node`.
17+
18+
This is the **chicken-egg problem** of type resolution.
19+
20+
---
21+
22+
## Why Single-Pass Analysis Fails
23+
24+
A naive compiler performs **a single linear pass**: it parses and immediately tries to resolve each symbol or type it encounters. But in the above case:
25+
26+
1. The compiler starts parsing `Node`.
27+
2. It encounters `next: Node`.
28+
3. It looks up `Node` - but `Node` is not yet in the type table.
29+
30+
At this point, the compiler either throws an error or defers, depending on how sophisticated its resolver is.
31+
32+
To solve this, modern compilers use **multi-phase or lazy resolution** strategies.
33+
34+
---
35+
36+
## Solution 1: Two-Pass Resolution (SIMPLEST)
37+
38+
**Goal:** Separate declaration and definition phases.
39+
40+
### How It Works
41+
42+
1. **Pass 1 – Declaration:** Register all type names (structs, enums, classes, etc.) into the symbol/type table **without** resolving their fields.
43+
2. **Pass 2 – Definition:** Resolve the internal structure of each type by looking up fields and members.
44+
45+
```text
46+
Pass 1 → Collect names: Node
47+
Pass 2 → Resolve contents: next: Node
48+
```
49+
50+
This ensures that even if `Node` is not yet defined when encountered, its **name** already exists in the type environment.
51+
52+
### Pros
53+
- Minimal code changes
54+
- Simple to understand
55+
- Handles most real-world cases
56+
57+
### Cons
58+
- Cannot handle cross-module or circular references that span multiple files
59+
- Limited when dependencies are deeply nested
60+
61+
---
62+
63+
## Solution 2: Deferred Resolution Queue (RECOMMENDED)
64+
65+
**Goal:** Allow the compiler to defer unresolved references for later.
66+
67+
### How It Works
68+
69+
1. During type resolution, if a reference (e.g., `next: Node`) cannot be resolved yet, **enqueue** it into a global or context-based deferred resolution list.
70+
2. After the main pass, iterate over this queue and attempt to resolve all pending references.
71+
3. Repeat until the queue stabilizes (no unresolved types remain) or a **circular dependency** is detected.
72+
73+
```text
74+
Unresolved → Add to deferred queue
75+
Queue processed → Resolved in later pass
76+
```
77+
78+
### Implementation Example (in pseudo-Go):
79+
80+
```go
81+
if !typeExists(typeName) {
82+
deferQueue.Add(currentContext, field)
83+
} else {
84+
resolve(field)
85+
}
86+
```
87+
88+
Then, later:
89+
90+
```go
91+
for !deferQueue.Empty() {
92+
for each item in deferQueue {
93+
tryResolve(item)
94+
}
95+
}
96+
```
97+
98+
### Pros
99+
- Handles complex dependency graphs
100+
- Extensible for multi-file or modular builds
101+
102+
### Cons
103+
- More complex state management
104+
- Requires good cycle detection logic
105+
106+
---
107+
108+
## Solution 3: Lazy Resolution with Completion States (MOST ROBUST)
109+
110+
**Goal:** Resolve types only when actually needed, tracking their state.
111+
112+
### How It Works
113+
114+
Each type has a **resolution state**:
115+
116+
- `Unresolved`
117+
- `Resolving`
118+
- `Resolved`
119+
120+
When resolving a type, if a field refers to another type:
121+
- If that type is `Unresolved`, begin resolving it recursively.
122+
- If it’s `Resolving`, a circular reference is detected.
123+
- If it’s `Resolved`, reuse its type information.
124+
125+
This approach naturally supports **recursive and cross-module resolution**.
126+
127+
### Example:
128+
129+
```go
130+
func resolveType(t Type) {
131+
if t.state == Resolved { return }
132+
if t.state == Resolving { throw CircularError }
133+
134+
t.state = Resolving
135+
for field in t.fields {
136+
resolveType(field.type)
137+
}
138+
t.state = Resolved
139+
}
140+
```
141+
142+
### Pros
143+
- Most robust and elegant
144+
- Handles cycles gracefully
145+
- Integrates well with dependency graphs and lazy loading
146+
147+
### Cons
148+
- Highest complexity
149+
- Requires careful bookkeeping of state transitions
150+
151+
---
152+
153+
## Comparison Table
154+
155+
| Method | Complexity | Handles Cross-Refs | Cycle Detection | Ease of Implementation |
156+
|--------|-------------|--------------------|-----------------|------------------------|
157+
| Two-Pass | Low ||| ✅✅✅ |
158+
| Deferred Queue | Medium ||| ✅✅ |
159+
| Lazy Resolution | High | ✅✅✅ | ✅✅✅ ||
160+
161+
---
162+
163+
## Final Recommendation
164+
165+
For most modern compilers - especially those supporting modules, generics, or interfaces - the **Deferred Resolution Queue** method provides the best balance between simplicity and flexibility.
166+
167+
It avoids premature complexity but still handles multi-type and multi-module dependencies elegantly.
168+
169+
If you’re designing a new language (like **Ferret**), where clarity and beginner-friendliness are priorities, start with **Deferred Resolution**, and evolve to **Lazy Resolution** when the compiler architecture matures.
170+
171+
---
172+
173+
### Summary
174+
- The chicken-egg problem arises from **self-referential or mutually-dependent types**.
175+
- **Single-pass** analysis fails because types are used before being fully defined.
176+
- **Multi-phase or lazy resolution** strategies solve this by separating declaration and definition, or deferring until information becomes available.
177+
178+
---
179+
180+
> **Best Practical Approach:** Deferred Resolution Queue - simple, scalable, and future-proof.
181+

compiler/internal/semantic/collector/declaration.go

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import (
55
"compiler/internal/frontend/ast"
66
"compiler/internal/modules"
77
"compiler/internal/semantic/analyzer"
8+
"compiler/internal/semantic/stype"
89
"compiler/internal/symbol"
910
"compiler/report"
1011
)
@@ -46,8 +47,15 @@ func collectTypeSymbol(c *analyzer.AnalyzerNode, decl *ast.TypeDeclStmt, cm *mod
4647
return
4748
}
4849

49-
// Declare the type symbol with placeholder type
50-
typeSymbol := symbol.NewSymbolWithLocation(aliasName, symbol.SymbolType, nil, decl.Alias.Loc())
50+
// Create a placeholder UserType with NotStarted state
51+
placeholderType := &stype.UserType{
52+
Name: aliasName,
53+
Definition: nil,
54+
State: stype.TypeNotStarted,
55+
}
56+
57+
// Declare the type symbol with placeholder UserType
58+
typeSymbol := symbol.NewSymbolWithLocation(aliasName, symbol.SymbolType, placeholderType, decl.Alias.Loc())
5159

5260
err := cm.SymbolTable.Declare(aliasName, typeSymbol)
5361
if err != nil {

compiler/internal/semantic/resolver/declarations.go

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -226,19 +226,25 @@ func resolveTypeDeclaration(r *analyzer.AnalyzerNode, decl *ast.TypeDeclStmt, cm
226226
return
227227
}
228228

229+
// Create placeholder UserType with TypeResolving state to handle self-references
230+
symbolType := &stype.UserType{
231+
Name: aliasName,
232+
State: stype.TypeResolving, // Set to resolving to prevent infinite recursion
233+
}
234+
235+
// Update the symbol's type with the placeholder first
236+
symbol.Type = symbolType
237+
238+
// Now resolve the actual definition (this may create self-references)
229239
typeToDeclare, err := semantic.DeriveSemanticType(decl.BaseType, cm)
230240
if err != nil {
231241
r.Ctx.Reports.AddSemanticError(r.Program.FullPath, decl.BaseType.Loc(), "invalid base type for type declaration: "+err.Error(), report.RESOLVER_PHASE)
232242
return
233243
}
234244

235-
symbolType := &stype.UserType{
236-
Name: aliasName,
237-
Definition: typeToDeclare,
238-
}
239-
240-
// Update the symbol's type
241-
symbol.Type = symbolType
245+
// Update the definition and mark as complete
246+
symbolType.Definition = typeToDeclare
247+
symbolType.State = stype.TypeComplete
242248

243249
if r.Debug {
244250
colors.ORANGE.Printf("resolved type alias '%v', Def: %v at %s\n", symbol.Type, symbol.Type.(*stype.UserType).Definition, decl.Alias.Loc())

compiler/internal/semantic/stype/types.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,21 @@ func (p *PrimitiveType) String() string {
2121
return p.TypeName.String()
2222
}
2323

24+
// TypeState represents the resolution state of a user-defined type
25+
type TypeState int
26+
27+
const (
28+
TypeNotStarted TypeState = iota // Type declared but not resolved
29+
TypeResolving // Type is currently being resolved (prevents infinite recursion)
30+
TypeComplete // Type is fully resolved
31+
)
32+
2433
// UserType represents user-defined types and type aliases
2534
type UserType struct {
2635
Name string
2736
Definition Type // For type aliases, this is the underlying type
2837
Methods map[string]*FunctionType // Methods associated with this type
38+
State TypeState // Resolution state to handle self-references
2939
}
3040

3141
func (u *UserType) String() string {

0 commit comments

Comments
 (0)