|
| 1 | +# The Chicken-Egg Problem in Compilers: Handling Self-Referential Types |
| 2 | + |
| 3 | +When building a compiler, one of the most subtle and classic challenges you’ll face during type checking and semantic analysis is the **chicken-egg problem** - particularly when dealing with **self-referential types**. |
| 4 | + |
| 5 | +Consider this simple definition: |
| 6 | + |
| 7 | +```ferret |
| 8 | +// Example in Ferret-like syntax |
| 9 | +
|
| 10 | +type Node struct { |
| 11 | + data: i32, |
| 12 | + next: Node |
| 13 | +} |
| 14 | +``` |
| 15 | + |
| 16 | +At first glance, this looks valid - a linked list node referencing another node. But during compilation, the **type checker** cannot resolve the type `Node` when it first encounters `next: Node`, because `Node` itself hasn’t been fully defined yet. The compiler is trapped in a paradox: it must understand `Node` to define `Node`. |
| 17 | + |
| 18 | +This is the **chicken-egg problem** of type resolution. |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Why Single-Pass Analysis Fails |
| 23 | + |
| 24 | +A naive compiler performs **a single linear pass**: it parses and immediately tries to resolve each symbol or type it encounters. But in the above case: |
| 25 | + |
| 26 | +1. The compiler starts parsing `Node`. |
| 27 | +2. It encounters `next: Node`. |
| 28 | +3. It looks up `Node` - but `Node` is not yet in the type table. |
| 29 | + |
| 30 | +At this point, the compiler either throws an error or defers, depending on how sophisticated its resolver is. |
| 31 | + |
| 32 | +To solve this, modern compilers use **multi-phase or lazy resolution** strategies. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Solution 1: Two-Pass Resolution (SIMPLEST) |
| 37 | + |
| 38 | +**Goal:** Separate declaration and definition phases. |
| 39 | + |
| 40 | +### How It Works |
| 41 | + |
| 42 | +1. **Pass 1 – Declaration:** Register all type names (structs, enums, classes, etc.) into the symbol/type table **without** resolving their fields. |
| 43 | +2. **Pass 2 – Definition:** Resolve the internal structure of each type by looking up fields and members. |
| 44 | + |
| 45 | +```text |
| 46 | +Pass 1 → Collect names: Node |
| 47 | +Pass 2 → Resolve contents: next: Node |
| 48 | +``` |
| 49 | + |
| 50 | +This ensures that even if `Node` is not yet defined when encountered, its **name** already exists in the type environment. |
| 51 | + |
| 52 | +### Pros |
| 53 | +- Minimal code changes |
| 54 | +- Simple to understand |
| 55 | +- Handles most real-world cases |
| 56 | + |
| 57 | +### Cons |
| 58 | +- Cannot handle cross-module or circular references that span multiple files |
| 59 | +- Limited when dependencies are deeply nested |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Solution 2: Deferred Resolution Queue (RECOMMENDED) |
| 64 | + |
| 65 | +**Goal:** Allow the compiler to defer unresolved references for later. |
| 66 | + |
| 67 | +### How It Works |
| 68 | + |
| 69 | +1. During type resolution, if a reference (e.g., `next: Node`) cannot be resolved yet, **enqueue** it into a global or context-based deferred resolution list. |
| 70 | +2. After the main pass, iterate over this queue and attempt to resolve all pending references. |
| 71 | +3. Repeat until the queue stabilizes (no unresolved types remain) or a **circular dependency** is detected. |
| 72 | + |
| 73 | +```text |
| 74 | +Unresolved → Add to deferred queue |
| 75 | +Queue processed → Resolved in later pass |
| 76 | +``` |
| 77 | + |
| 78 | +### Implementation Example (in pseudo-Go): |
| 79 | + |
| 80 | +```go |
| 81 | +if !typeExists(typeName) { |
| 82 | + deferQueue.Add(currentContext, field) |
| 83 | +} else { |
| 84 | + resolve(field) |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +Then, later: |
| 89 | + |
| 90 | +```go |
| 91 | +for !deferQueue.Empty() { |
| 92 | + for each item in deferQueue { |
| 93 | + tryResolve(item) |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +### Pros |
| 99 | +- Handles complex dependency graphs |
| 100 | +- Extensible for multi-file or modular builds |
| 101 | + |
| 102 | +### Cons |
| 103 | +- More complex state management |
| 104 | +- Requires good cycle detection logic |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## Solution 3: Lazy Resolution with Completion States (MOST ROBUST) |
| 109 | + |
| 110 | +**Goal:** Resolve types only when actually needed, tracking their state. |
| 111 | + |
| 112 | +### How It Works |
| 113 | + |
| 114 | +Each type has a **resolution state**: |
| 115 | + |
| 116 | +- `Unresolved` |
| 117 | +- `Resolving` |
| 118 | +- `Resolved` |
| 119 | + |
| 120 | +When resolving a type, if a field refers to another type: |
| 121 | +- If that type is `Unresolved`, begin resolving it recursively. |
| 122 | +- If it’s `Resolving`, a circular reference is detected. |
| 123 | +- If it’s `Resolved`, reuse its type information. |
| 124 | + |
| 125 | +This approach naturally supports **recursive and cross-module resolution**. |
| 126 | + |
| 127 | +### Example: |
| 128 | + |
| 129 | +```go |
| 130 | +func resolveType(t Type) { |
| 131 | + if t.state == Resolved { return } |
| 132 | + if t.state == Resolving { throw CircularError } |
| 133 | + |
| 134 | + t.state = Resolving |
| 135 | + for field in t.fields { |
| 136 | + resolveType(field.type) |
| 137 | + } |
| 138 | + t.state = Resolved |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +### Pros |
| 143 | +- Most robust and elegant |
| 144 | +- Handles cycles gracefully |
| 145 | +- Integrates well with dependency graphs and lazy loading |
| 146 | + |
| 147 | +### Cons |
| 148 | +- Highest complexity |
| 149 | +- Requires careful bookkeeping of state transitions |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## Comparison Table |
| 154 | + |
| 155 | +| Method | Complexity | Handles Cross-Refs | Cycle Detection | Ease of Implementation | |
| 156 | +|--------|-------------|--------------------|-----------------|------------------------| |
| 157 | +| Two-Pass | Low | ❌ | ⚪ | ✅✅✅ | |
| 158 | +| Deferred Queue | Medium | ✅ | ⚪ | ✅✅ | |
| 159 | +| Lazy Resolution | High | ✅✅✅ | ✅✅✅ | ✅ | |
| 160 | + |
| 161 | +--- |
| 162 | + |
| 163 | +## Final Recommendation |
| 164 | + |
| 165 | +For most modern compilers - especially those supporting modules, generics, or interfaces - the **Deferred Resolution Queue** method provides the best balance between simplicity and flexibility. |
| 166 | + |
| 167 | +It avoids premature complexity but still handles multi-type and multi-module dependencies elegantly. |
| 168 | + |
| 169 | +If you’re designing a new language (like **Ferret**), where clarity and beginner-friendliness are priorities, start with **Deferred Resolution**, and evolve to **Lazy Resolution** when the compiler architecture matures. |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +### Summary |
| 174 | +- The chicken-egg problem arises from **self-referential or mutually-dependent types**. |
| 175 | +- **Single-pass** analysis fails because types are used before being fully defined. |
| 176 | +- **Multi-phase or lazy resolution** strategies solve this by separating declaration and definition, or deferring until information becomes available. |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +> **Best Practical Approach:** Deferred Resolution Queue - simple, scalable, and future-proof. |
| 181 | +
|
0 commit comments