STATUS: partial — Pipeline, Cranelift backend, type system, contracts, effects, Query API, LSP are implemented. LLVM backend split, modular runtime, capability/arena memory model, and tiered contracts are planned (Epics #299, #298, #296, #297).
This document describes the internals of the Gradient compiler and toolchain. It is intended for AI agents working on or integrating with Gradient.
Source (.gr) -> Lexer -> Parser -> AST -> Type Checker (+ Contracts, Generics, Effects, FFI) -> IR (SSA) -> CodegenBackend (Cranelift or LLVM) -> Linker -> Binary
The full pipeline is wired end-to-end and produces native executables from Gradient source files.
Hand-written tokenizer that emits a flat token stream. The lexer tracks indentation depth and synthesizes INDENT and DEDENT tokens to represent indentation-based block structure (similar to Python, but with stricter rules).
Token categories: keywords (17), identifiers, literals (int, float, string, bool), operators (including % for modulo), delimiters (including [ and ] for generics), INDENT, DEDENT, NEWLINE, and error tokens for recovery.
All tokens are ASCII-only. No Unicode operators are permitted.
Location: codebase/compiler/src/lexer/ -- 94 tests
Recursive descent, LL(1). Implements the PEG grammar defined in resources/grammar.peg. Produces a fully typed AST where every node carries source location information (Span with file id, start position, end position).
Error recovery is implemented. The parser uses synchronization tokens (newlines, dedents, keywords) to resync and continue parsing, collecting all diagnostics for a single pass.
Key grammar features:
- Module declarations (
mod path) and use imports (use path,use path.{items}) - Function definitions with colon-delimited blocks:
fn name(params) -> RetType: - Generic function definitions:
fn identity[T](x: T) -> T: - Generic enum type declarations:
type Option[T] = Some(T) | None - Extern function declarations (
@extern,@extern("libm")) with no body - Export function annotations (
@export) for C-compatible linkage - Let bindings with optional type annotations; mutable bindings (
let mut) - Assignment statements (
name = expr) for mutable bindings - If/else if/else expressions (with
:before each branch block) - For loops (
for var in expr:) - While loops (
while condition:) - Match expressions with integer, boolean, enum variant, and wildcard patterns
- Enum type declarations (
type Color = Red | Green | Blue) - All arithmetic, comparison, and logical operators with correct precedence
- Typed holes (
?name) - Annotations (
@name(args)) including@cap,@budget,@extern, and@export - Field access (
expr.field)
Location: codebase/compiler/src/parser/ -- 128 tests
Typed AST nodes follow the Rust enum + struct pattern. Every node is wrapped in Spanned<T> which carries a Span value encoding its source location:
Span { file_id: u32, start: Position, end: Position }
Position { line: u32, col: u32, offset: u32 }
The AST is organized into modules:
module.rs--Module,ModuleDecl,UseDeclitem.rs--FnDef,ExternFnDecl,Param,Annotation,ItemKindstmt.rs--StmtKind::Let,StmtKind::Assign,StmtKind::Ret,StmtKind::Exprexpr.rs--ExprKindvariants (literals, binary/unary ops, calls, if, for, typed holes, field access, etc.)types.rs--TypeExpr(Named, Unit, Fn),EffectSetblock.rs--Block(a spanned list of statements)span.rs--Span,Position,Spanned<T>
Location: codebase/compiler/src/ast/
Static type checker with inference for let bindings. Key properties:
- Five built-in types:
Int,Float,String,Bool,() - Generics: type parameters on functions (
fn identity[T](x: T) -> T) and enums (type Option[T] = Some(T) | None) with bidirectional type inference via unification at call sites. - Enum types: user-defined algebraic data types with variant matching.
- Forward references: all function signatures are registered before any bodies are checked, allowing mutual recursion.
- Type inference: let bindings without explicit type annotations have their types inferred from the right-hand side.
- Mutable bindings:
let mutbindings are tracked; assignment to immutable bindings is rejected. - Effect validation: calling a function with
!{IO}from a pure function is a type error. The effect system is enforced -- all 5 canonical effects (IO, Net, FS, Mut, Time) are recognized and unknown effects are rejected. Module-level@capceilings are checked. - Effect polymorphism: lowercase effect variables (
!{e}) resolve at call sites (pure callbacks -> empty, effectful callbacks -> concrete effects).is_effect_polymorphicexposed in the query API. - Design-by-contract:
@requires/@ensuresannotations are validated during type checking. Theresultkeyword is recognized in postconditions. Contract conditions must be boolean expressions. - Budget annotations:
@budget(cpu: 5s, mem: 100mb)on functions. The compiler checks budget containment -- a callee's budget must not exceed the caller's budget. - FFI type validation:
@externand@exportfunctions are checked to ensure only FFI-compatible types (Int,Float,Bool,String,()) appear in their signatures. - Lexical scoping: scope stack with push/pop for blocks.
- Error recovery:
Ty::Errorsentinel suppresses cascading diagnostics. - String concatenation:
+onStringoperands is type-checked as concatenation. - Builtin function registry:
print,println,print_int,print_float,print_bool,abs,min,max,mod_int,to_string,int_to_string,rangeare all pre-loaded.
The type checker does not modify the AST. It reads the AST and produces a list of TypeErrors.
Sub-modules:
checker.rs-- Main type-checking logiceffects.rs-- Effect system: canonical effects, validation, and@capenforcementenv.rs-- Type environment and scope managementerror.rs-- Diagnostic types for type errorstypes.rs-- Internal type representations
Location: codebase/compiler/src/typechecker/ -- 371 tests
SSA (Static Single Assignment) form. The IR builder translates the typed AST into an intermediate representation suitable for code generation.
Instruction set:
| Instruction | Description |
|---|---|
Const |
Load a constant value |
Call |
Function call |
Ret |
Return from function |
Add |
Integer/float addition |
Sub |
Integer/float subtraction |
Mul |
Integer/float multiplication |
Div |
Integer/float division |
Mod |
Integer modulo |
Cmp |
Comparison (returns bool) |
Branch |
Conditional branch |
Jump |
Unconditional branch |
Phi |
SSA phi node (merge point) |
Alloca |
Stack allocation |
Load |
Load from memory |
Store |
Store to memory |
Neg |
Unary negation |
Not |
Logical negation |
StringConcat |
String concatenation |
Location: codebase/compiler/src/ir/ -- 29 tests (in builder/tests.rs)
The codegen layer is abstracted behind the CodegenBackend trait. Both backends consume the same SSA IR and produce object files.
CodegenBackend trait:
- Defines the interface that all codegen backends must implement.
- The build system selects the backend based on the
--releaseflag.
Cranelift (debug backend, default):
- Translates IR to native machine code via Cranelift.
- Emits an object file (
.o) that is then linked by the system'sccto produce an executable. - Fast compilation, suitable for development and iteration.
- Working and producing real binaries.
LLVM (release backend, feature-gated):
- Behind the
llvmcargo feature flag. - Selected when
--releaseis passed and the feature is compiled in. - Stub implementation until LLVM libraries are available on the build host.
FFI linkage:
@externfunctions produceLinkage::Importin the IR, resolved at link time.@exportfunctions produceLinkage::Exportin the IR, making symbols visible to C callers.- FFI type validation ensures only compatible types (
Int,Float,Bool,String,()) cross the boundary.
Location: codebase/compiler/src/codegen/
Multi-file module resolution. Handles resolving use declarations to source files on disk, parsing dependent modules, and building a combined type environment. Detects circular imports.
use mathresolves tomath.grin the same directory.use a.bresolves toa/b.grrelative to the source root.
Location: codebase/compiler/src/resolve.rs -- 8 tests
Canonical code formatter (AST pretty-printer). Parses the source, walks the AST, and emits canonically formatted text with consistent 4-space indentation, operator spacing, and normalized line breaks. Guarantees one canonical form for every program.
Limitation: Comments are not preserved (they are stripped during lexing).
Location: codebase/compiler/src/fmt.rs -- 25 tests
Interactive Read-Eval-Print Loop. Operates in check mode: each input is type-checked (not compiled) and the inferred type or errors are reported immediately. Supports both interactive (TTY with prompt) and non-interactive (piped stdin) modes. Handles expressions, let bindings, and function definitions.
Location: codebase/compiler/src/repl.rs -- 30 tests
Structured query API that turns the compiler into a queryable service. Agents call Session::from_source and query for structured, JSON-serializable data (diagnostics, module contracts, symbol tables, contracts, completion context, context budgets). Contract annotations (@requires/@ensures) are included in symbol entries and module contracts. FFI metadata (@extern/@export, linkage) is included in symbol entries and module contracts. Completion context provides type-directed suggestions at any cursor position. Context budget tooling returns relevance-ranked items within a token budget. Project index provides a structural overview.
Key methods:
session.check()-- type-check and return diagnosticssession.symbols()-- symbol table with types, effects, contractssession.module_contract()-- public API surface including call graph and budgetssession.completion_context(line, col)-- type-directed completion candidatessession.context_budget(fn_name, budget)-- relevance-ranked context within token budgetsession.project_index()-- structural overview (modules, signatures, types)session.rename(old, new)-- compiler-verified renamesession.callees(fn_name)-- call graph query
Location: codebase/compiler/src/query.rs -- 74 tests
Gradient/
├── assets/ # Logo, banner
├── codebase/
│ ├── build-system/ # `gradient` CLI (Rust, clap)
│ │ └── src/
│ │ ├── main.rs # CLI entry point
│ │ ├── commands/ # build, run, check, new, init, fmt, test, repl, query, add, update
│ │ ├── manifest.rs # gradient.toml parsing (including [dependencies])
│ │ ├── lockfile.rs # gradient.lock generation and parsing (SHA-256 checksums)
│ │ ├── resolver.rs # Dependency resolver (cycle detection, diamond dedup, topo sort)
│ │ └── project.rs # Project discovery and paths
│ ├── compiler/ # Compiler pipeline (Rust, Cranelift)
│ │ └── src/
│ │ ├── main.rs # Compiler driver
│ │ ├── lib.rs # Library crate root
│ │ ├── lexer/ # Tokenizer with INDENT/DEDENT
│ │ ├── parser/ # Recursive descent parser
│ │ ├── ast/ # AST node definitions
│ │ ├── typechecker/ # Type checker, effects, and environment
│ │ ├── ir/ # SSA IR and builder
│ │ ├── codegen/ # CodegenBackend trait, Cranelift + LLVM backends
│ │ ├── resolve.rs # Multi-file module resolution
│ │ ├── fmt.rs # Canonical code formatter
│ │ ├── repl.rs # Interactive REPL (check mode)
│ │ └── query.rs # Structured query API for agents
│ ├── devtools/
│ │ └── lsp/ # LSP server (tower-lsp, tokio)
│ │ └── src/
│ │ ├── main.rs # Server entry point
│ │ ├── backend.rs # LanguageServer implementation
│ │ └── diagnostics.rs # Compiler integration
│ ├── test-framework/ # Golden test framework
│ └── ... # runtime, stdlib, etc. (planned)
├── docs/ # Documentation (this file lives here)
└── resources/ # Grammar, language reference, examples
gradient is the unified entry point for all toolchain operations. Built in Rust with clap.
Working commands:
gradient new <name>-- creates a project directory withgradient.tomlandsrc/main.grgradient init-- initializes a project in the current directorygradient build-- finds the project root, resolves dependencies, invokes the compiler onsrc/main.gr, links withcc, outputs binary totarget/debug/<name>gradient build --release-- same as above but selects the LLVM backend (when compiled with thellvmfeature) and outputs totarget/release/<name>gradient run-- builds then executes the binary, forwarding the exit codegradient check-- invokes the compiler for type checking, discards the object filegradient fmt-- canonically formats Gradient source filesgradient test-- runs the project's test suitegradient repl-- starts an interactive REPL (check mode)gradient add <path>-- adds a path-based dependency togradient.tomland updatesgradient.lockgradient update-- re-resolves all dependencies and regeneratesgradient.lock
The build system finds the project root by searching upward for gradient.toml. It locates the compiler binary relative to its own path.
The package system manages project dependencies through gradient.toml and gradient.lock.
gradient.toml [dependencies] section:
- Path-based dependencies:
my-lib = { path = "../my-lib" } - Each dependency must have its own
gradient.toml
gradient.lock lockfile:
- SHA-256 content-addressed checksums for all resolved packages
- Generated automatically on build, add, and update
- Should be committed to version control
Dependency resolver:
- Cycle detection: rejects circular dependency graphs
- Diamond dedup: shared transitive dependencies are resolved once
- Topological ordering: dependencies are built in correct order
- Build integration:
gradient buildresolves dependencies before compilation
gradient.toml defines a project:
[package]
name = "my-project"
version = "0.1.0"
[dependencies]
my-lib = { path = "../my-lib" }my-project/
├── gradient.toml # Manifest with [dependencies]
├── gradient.lock # Lockfile (SHA-256 checksums)
├── src/
│ └── main.gr
└── target/
├── debug/ # Cranelift backend output
└── release/ # LLVM backend output
The LSP server (codebase/devtools/lsp/) provides IDE-like features for Gradient. It communicates over stdio via JSON-RPC using tower-lsp.
- Text document sync: full document sync on open, change, and save.
- Diagnostics: runs the full compiler pipeline (lex, parse, typecheck) on every change and publishes diagnostics.
- Hover: shows type/signature information for builtins, keywords, and user-defined functions (by parsing the AST).
- Completion: offers keywords and builtin function names with their signatures.
- Custom
gradient/batchDiagnostics: sends all diagnostics for a file in one notification with per-phase error counts (lex_errors,parse_errors,type_errors).
The LSP server uses the same compiler library crate as the CLI compiler -- there is no separate parser or approximate analysis.
Every design decision optimizes for AI agent consumption:
- Token efficiency: the language syntax minimizes token count for common patterns.
- Machine-readable output is the default; human-readable output is a formatting layer on top.
- The LSP
gradient/batchDiagnosticsnotification provides the complete diagnostic picture in one round trip.
Every token in Gradient source code is a printable ASCII character. No Unicode operators, no special symbols. This eliminates encoding issues and ensures every agent and tool can process Gradient source without character set complications.
No Hidden Magic
- Effects are declared in function signatures and tracked by the type checker.
- All function signatures are explicit (parameter types and return types required).
- There is no implicit allocation, no implicit effect propagation, and no implicit capability granting.
No syntactic sugar is introduced unless it carries distinct semantics. If two syntactic forms would compile to identical IR, only one is permitted. This makes code written by different agents (or humans) structurally consistent and eliminates style-based ambiguity.