The Dana compiler follows multi-pass architecture:
Source Code (.dana)
│
▼
┌───────────────┐
│ Lexer │ Tokenization with layout handling
└───────┬───────┘
│ Token Stream
▼
┌───────────────┐
│ Parser │ Bison-based LALR(1) parser
└───────┬───────┘
│ Abstract Syntax Tree
▼
┌───────────────┐
│ Semantic │ Type checking, symbol resolution
│ Analysis │ Control flow validation
└───────┬───────┘
│ Annotated AST
▼
┌───────────────┐
│ Code │ LLVM IR generation
│ Generation │
└───────┬───────┘
│ LLVM IR
▼
┌───────────────┐
│ Optimizer │ LLVM optimization passes
└───────┬───────┘
│
▼
┌───────────────┐
│ LLVM Backend │ Assembly/Object generation
└───────┬───────┘
│
▼
Executable (a.out)
src/
├── main.cpp # Compiler driver
├── Makefile # Build configuration
│
├── frontend/
│ ├── lexer/
│ │ └── lexer.l # Flex lexer with layout rules
│ │
│ ├── parser/
│ │ ├── parser.y # Bison grammar
│ │ └── parser.tab.* # Generated parser
│ │
│ ├── ast/
│ │ ├── ast.hpp # AST node definitions
│ │ ├── ast.cpp # AST implementation
│ │ ├── ast_visitor.hpp # Visitor interface
│ │ ├── ast_print.cpp # Pretty printer
│ │ └── operators.hpp # Operator enums
│ │
│ ├── symbol/
│ │ ├── symbol.hpp # Symbol hierarchy
│ │ ├── symbol_table.hpp # Symbol table
│ │ ├── scope.hpp # Scope management
│ │ └── sematype.hpp # Type representations
│ │
│ ├── semantic/
│ │ ├── semantic.hpp # Semantic pass entry point
│ │ ├── semantic_pass.* # Type checking pass
│ │ ├── control_flow.* # Control flow validation
│ │ ├── sema_context.* # Analysis context
│ │ └── builtins.* # Built-in function setup
│ │
│ └── common/
│ └── diagnostics.* # Error reporting
│
├── backend/
│ ├── codegen/
│ │ ├── codegen.hpp # Code generation interface
│ │ ├── codegen_*.cpp # Code gen implementations
│ │ └── codegen_context.* # LLVM context management
│ │
│ └── optimizer/
│ └── optimizer.* # LLVM optimization wrapper
│
└── runtime/
├── lib.c # Runtime library source
├── lib_bitcode.hpp # Embedded runtime bitcode
└── danalib.* # Built-in function codegen
The lexer is implemented using Flex and handles:
- Token Recognition: Keywords, identifiers, literals, operators
- Layout Management: Indentation-based block inference using a guide stack
- Comment Handling: Both
--line comments and(* *)nested block comments - Escape Sequences: In strings and character literals
Key features:
- Guide Stack: Tracks indentation levels for layout-sensitive parsing
- Auto-End Tokens: Automatically inserted
T_AUTO_ENDtokens when dedenting - Location Tracking: Line and column numbers for error reporting
The parser is implemented using Bison with the C++ skeleton and handles:
- Grammar: LALR(1) grammar for Dana
- AST Construction: Builds typed AST nodes
- Error Handling: Reports syntax errors with location
The AST uses a class hierarchy with visitor pattern:
AstNode (base)
├── Type
├── FParType
├── Program
├── Def
│ ├── VarDef
│ ├── FuncDecl
│ └── FuncDef
├── Header
├── FParDef
├── Stmt
│ ├── SkipStmt
│ ├── ExitStmt
│ ├── BreakStmt
│ ├── ContinueStmt
│ ├── AssignStmt
│ ├── ReturnStmt
│ ├── ProcCall
│ ├── IfStmt
│ └── LoopStmt
├── Block
├── Lval
│ ├── IdLVal
│ ├── StringLiteralLVal
│ └── IndexLVal
├── Expr
│ ├── IntConst
│ ├── CharConst
│ ├── TrueConst
│ ├── FalseConst
│ ├── LValueExpr
│ ├── ParenExpr
│ ├── FuncCall
│ ├── UnaryExpr
│ └── BinaryExpr
└── Cond
├── ExprCond
├── ParenCond
├── NotCond
├── BinaryCond
└── RelCond
Symbol (base)
├── VarSymbol # Local variables
├── ParamSymbol # Function parameters
└── FuncSymbol # Functions and procedures
Each symbol stores:
- Name and location
- Type information
- Defining function (for closure analysis)
SemaType (base)
├── IntType # int
├── ByteType # byte
├── VoidType # procedure return
├── ArrayType # T[N] or T[]
└── FuncType # (params) -> return
Types are interned (shared) for efficient comparison.
- Manages nested scopes
- Handles symbol lookup with shadowing
- Supports forward declarations
Two-pass semantic analysis:
Performs type checking and symbol resolution:
- Type Resolution: Converts AST types to semantic types
- Symbol Declaration: Registers variables, functions
- Type Checking: Validates operations, assignments, calls
- Forward Declaration Handling: Matches declarations with definitions
Validates control flow:
- Return/Exit Validation: Ensures proper placement
- Break/Continue Validation: Checks loop context
- Reachability Analysis: Ensures functions return values
- Loop Label Validation: Checks labeled break/continue
LLVM-based code generation with organized modules:
- codegen_decl.cpp: Variable and function declarations
- codegen_stmt.cpp: Statement code generation
- codegen_expr.cpp: Expression code generation
- codegen_cond.cpp: Condition code generation
- codegen_call.cpp: Function/procedure calls
- codegen_loop.cpp: Loop constructs
The CodegenContext manages:
- LLVM context, module, and IR builder
- Symbol-to-value mapping
- Current function context
- Loop break/continue targets
Uses LLVM's optimization pipeline:
-O0: No optimization-O1: Basic optimizations-O2: Standard optimizations-O3: Aggressive optimizations
Written in C, compiled to LLVM bitcode and embedded:
- I/O functions:
writeInteger,readString, etc. - String functions:
strlen,strcmp, etc. - Type conversion:
extend,shrink
- Lexing: Source → Tokens (with layout handling)
- Parsing: Tokens → AST
- Semantic Pass: Type checking, symbol resolution
- Control Flow Pass: Validate control structures
- Code Generation: AST → LLVM IR
- Optimization: Apply LLVM passes
- Linking: Link with runtime library
- Backend: LLVM IR → Assembly → Object → Executable
The Diagnostics class provides robust error handling, providing the error location in source file, the compilation stage where the error occured and the severity of it (possible classifications: Note, Warning, Error)