refactor: implement self-referential type handling and update related documentation

itsfuad · itsfuad · commit 534f4412c995 · 2025-10-13T23:42:53.000+06:00
diff --git a/app/cmd/start.fer b/app/cmd/start.fer
@@ -49,4 +49,10 @@ let value := 10;
 // };
 
 let myname := "Ferret";
-let byt := myname[0];
+let byt := myname[0];
+
+type Node struct {
+    value: i32,
+    next: Node
+};
+
diff --git a/app/fer.ret b/app/fer.ret
@@ -4,7 +4,7 @@ name = "demo-app"
 version = "0.0.2"
 
 [build]
-entry = "cmd/if_test.fer"
+entry = "test_self_ref.fer"
 output = "bin"
 
 [cache]
diff --git a/app/test_self_ref.fer b/app/test_self_ref.fer
@@ -0,0 +1,19 @@
+// Test self-referential types
+type Node struct {
+    value: i32,
+    next: Node  // Self-reference should now work
+};
+
+type NodeB struct {
+    data: str,
+    ref: NodeA  // Mutual reference
+};
+
+type NodeA struct {
+    id: i32,
+    ref: NodeB  // Mutual reference back
+};
+
+fn main() {
+    let message := "Self-referential types test";
+}
diff --git a/articles/self_ref_issue.md b/articles/self_ref_issue.md
@@ -0,0 +1,181 @@
+# The Chicken-Egg Problem in Compilers: Handling Self-Referential Types
+
+When building a compiler, one of the most subtle and classic challenges you’ll face during type checking and semantic analysis is the **chicken-egg problem** - particularly when dealing with **self-referential types**.
+
+Consider this simple definition:
+
+```ferret
+// Example in Ferret-like syntax
+
+type Node struct {
+    data: i32,
+    next: Node
+}
+```
+
+At first glance, this looks valid - a linked list node referencing another node. But during compilation, the **type checker** cannot resolve the type `Node` when it first encounters `next: Node`, because `Node` itself hasn’t been fully defined yet. The compiler is trapped in a paradox: it must understand `Node` to define `Node`.
+
+This is the **chicken-egg problem** of type resolution.
+
+---
+
+## Why Single-Pass Analysis Fails
+
+A naive compiler performs **a single linear pass**: it parses and immediately tries to resolve each symbol or type it encounters. But in the above case:
+
+1. The compiler starts parsing `Node`.
+2. It encounters `next: Node`.
+3. It looks up `Node` - but `Node` is not yet in the type table.
+
+At this point, the compiler either throws an error or defers, depending on how sophisticated its resolver is.
+
+To solve this, modern compilers use **multi-phase or lazy resolution** strategies.
+
+---
+
+## Solution 1: Two-Pass Resolution (SIMPLEST)
+
+**Goal:** Separate declaration and definition phases.
+
+### How It Works
+
+1. **Pass 1 – Declaration:** Register all type names (structs, enums, classes, etc.) into the symbol/type table **without** resolving their fields.
+2. **Pass 2 – Definition:** Resolve the internal structure of each type by looking up fields and members.
+
+```text
+Pass 1 → Collect names: Node
+Pass 2 → Resolve contents: next: Node
+```
+
+This ensures that even if `Node` is not yet defined when encountered, its **name** already exists in the type environment.
+
+### Pros
+- Minimal code changes
+- Simple to understand
+- Handles most real-world cases
+
+### Cons
+- Cannot handle cross-module or circular references that span multiple files
+- Limited when dependencies are deeply nested
+
+---
+
+## Solution 2: Deferred Resolution Queue (RECOMMENDED)
+
+**Goal:** Allow the compiler to defer unresolved references for later.
+
+### How It Works
+
+1. During type resolution, if a reference (e.g., `next: Node`) cannot be resolved yet, **enqueue** it into a global or context-based deferred resolution list.
+2. After the main pass, iterate over this queue and attempt to resolve all pending references.
+3. Repeat until the queue stabilizes (no unresolved types remain) or a **circular dependency** is detected.
+
+```text
+Unresolved → Add to deferred queue
+Queue processed → Resolved in later pass
+```
+
+### Implementation Example (in pseudo-Go):
+
+```go
+if !typeExists(typeName) {
+    deferQueue.Add(currentContext, field)
+} else {
+    resolve(field)
+}
+```
+
+Then, later:
+
+```go
+for !deferQueue.Empty() {
+    for each item in deferQueue {
+        tryResolve(item)
+    }
+}
+```
+
+### Pros
+- Handles complex dependency graphs
+- Extensible for multi-file or modular builds
+
+### Cons
+- More complex state management
+- Requires good cycle detection logic
+
+---
+
+## Solution 3: Lazy Resolution with Completion States (MOST ROBUST)
+
+**Goal:** Resolve types only when actually needed, tracking their state.
+
+### How It Works
+
+Each type has a **resolution state**:
+
+- `Unresolved`
+- `Resolving`
+- `Resolved`
+
+When resolving a type, if a field refers to another type:
+- If that type is `Unresolved`, begin resolving it recursively.
+- If it’s `Resolving`, a circular reference is detected.
+- If it’s `Resolved`, reuse its type information.
+
+This approach naturally supports **recursive and cross-module resolution**.
+
+### Example:
+
+```go
+func resolveType(t Type) {
+    if t.state == Resolved { return }
+    if t.state == Resolving { throw CircularError }
+
+    t.state = Resolving
+    for field in t.fields {
+        resolveType(field.type)
+    }
+    t.state = Resolved
+}
+```
+
+### Pros
+- Most robust and elegant
+- Handles cycles gracefully
+- Integrates well with dependency graphs and lazy loading
+
+### Cons
+- Highest complexity
+- Requires careful bookkeeping of state transitions
+
+---
+
+## Comparison Table
+
+| Method | Complexity | Handles Cross-Refs | Cycle Detection | Ease of Implementation |
+|--------|-------------|--------------------|-----------------|------------------------|
+| Two-Pass | Low | ❌ | ⚪ | ✅✅✅ |
+| Deferred Queue | Medium | ✅ | ⚪ | ✅✅ |
+| Lazy Resolution | High | ✅✅✅ | ✅✅✅ | ✅ |
+
+---
+
+## Final Recommendation
+
+For most modern compilers - especially those supporting modules, generics, or interfaces - the **Deferred Resolution Queue** method provides the best balance between simplicity and flexibility.
+
+It avoids premature complexity but still handles multi-type and multi-module dependencies elegantly.
+
+If you’re designing a new language (like **Ferret**), where clarity and beginner-friendliness are priorities, start with **Deferred Resolution**, and evolve to **Lazy Resolution** when the compiler architecture matures.
+
+---
+
+### Summary
+- The chicken-egg problem arises from **self-referential or mutually-dependent types**.
+- **Single-pass** analysis fails because types are used before being fully defined.
+- **Multi-phase or lazy resolution** strategies solve this by separating declaration and definition, or deferring until information becomes available.
+
+---
+
+> **Best Practical Approach:** Deferred Resolution Queue - simple, scalable, and future-proof.
+
diff --git a/compiler/internal/semantic/collector/declaration.go b/compiler/internal/semantic/collector/declaration.go
@@ -5,6 +5,7 @@ import (
 	"compiler/internal/frontend/ast"
 	"compiler/internal/modules"
 	"compiler/internal/semantic/analyzer"
+	"compiler/internal/semantic/stype"
 	"compiler/internal/symbol"
 	"compiler/report"
 )
@@ -46,8 +47,15 @@ func collectTypeSymbol(c *analyzer.AnalyzerNode, decl *ast.TypeDeclStmt, cm *mod
 		return
 	}
 
-	// Declare the type symbol with placeholder type
-	typeSymbol := symbol.NewSymbolWithLocation(aliasName, symbol.SymbolType, nil, decl.Alias.Loc())
+	// Create a placeholder UserType with NotStarted state
+	placeholderType := &stype.UserType{
+		Name:       aliasName,
+		Definition: nil,
+		State:      stype.TypeNotStarted,
+	}
+
+	// Declare the type symbol with placeholder UserType
+	typeSymbol := symbol.NewSymbolWithLocation(aliasName, symbol.SymbolType, placeholderType, decl.Alias.Loc())
 
 	err := cm.SymbolTable.Declare(aliasName, typeSymbol)
 	if err != nil {
diff --git a/compiler/internal/semantic/resolver/declarations.go b/compiler/internal/semantic/resolver/declarations.go
@@ -226,19 +226,25 @@ func resolveTypeDeclaration(r *analyzer.AnalyzerNode, decl *ast.TypeDeclStmt, cm
 		return
 	}
 
+	// Create placeholder UserType with TypeResolving state to handle self-references
+	symbolType := &stype.UserType{
+		Name:  aliasName,
+		State: stype.TypeResolving, // Set to resolving to prevent infinite recursion
+	}
+
+	// Update the symbol's type with the placeholder first
+	symbol.Type = symbolType
+
+	// Now resolve the actual definition (this may create self-references)
 	typeToDeclare, err := semantic.DeriveSemanticType(decl.BaseType, cm)
 	if err != nil {
 		r.Ctx.Reports.AddSemanticError(r.Program.FullPath, decl.BaseType.Loc(), "invalid base type for type declaration: "+err.Error(), report.RESOLVER_PHASE)
 		return
 	}
 
-	symbolType := &stype.UserType{
-		Name:       aliasName,
-		Definition: typeToDeclare,
-	}
-
-	// Update the symbol's type
-	symbol.Type = symbolType
+	// Update the definition and mark as complete
+	symbolType.Definition = typeToDeclare
+	symbolType.State = stype.TypeComplete
 
 	if r.Debug {
 		colors.ORANGE.Printf("resolved type alias '%v', Def: %v at %s\n", symbol.Type, symbol.Type.(*stype.UserType).Definition, decl.Alias.Loc())
diff --git a/compiler/internal/semantic/stype/types.go b/compiler/internal/semantic/stype/types.go
@@ -21,11 +21,21 @@ func (p *PrimitiveType) String() string {
 	return p.TypeName.String()
 }
 
+// TypeState represents the resolution state of a user-defined type
+type TypeState int
+
+const (
+	TypeNotStarted TypeState = iota // Type declared but not resolved
+	TypeResolving                   // Type is currently being resolved (prevents infinite recursion)
+	TypeComplete                    // Type is fully resolved
+)
+
 // UserType represents user-defined types and type aliases
 type UserType struct {
 	Name       string
 	Definition Type                     // For type aliases, this is the underlying type
 	Methods    map[string]*FunctionType // Methods associated with this type
+	State      TypeState                // Resolution state to handle self-references
 }
 
 func (u *UserType) String() string {
diff --git a/compiler/internal/semantic/type_resolver.go b/compiler/internal/semantic/type_resolver.go