Nilan is a programming language I am currently developing for fun π, implemented in Go. My goal is to learn more about how programming languages work under the hood and to explore the different pipelines involved β from taking source code as input to making the CPU execute instructions π€.
β Write the press release
β
Arithmetic expressions: +
, -
, *
, /
β
Comparison operators: >
, >=
, <
, <=
, ==
, !=
β
Boolean literals: true
, false
β
null
literal
β Parenthesized expressions
β Variable identifiers and names
β
Assignment statements (e.g., a = 2
)
β
Unary operations: logical not !
, negation -
β REPL (Read-Eval-Print Loop) for interactive testing
Currently, Nilan supports only very primitive expressions and literals. The following are not supported yet:
π΄ String literals and operations
π΄ Functions and function calls
π΄ Arrays or other complex data structures
π΄ Control flow constructs (e.g., if
, loops)
π΄ Exponentiation or other advanced operators
π΄ Tree-Walk interpreter
To start a REPL and try out Nilan expressions:
go run .
Start typing Nilan expressions in the interactive prompt.
Nilanβs grammar is defined using ISO Extended BackusβNaur Form (ISO EBNF), conforming to ISO/IEC 14977.
program = statement, EOF ;
statement = expression | printStmt ;
expression = assignment ;
assignment = IDENTIFIER, "=", assignment
| equality ;
equality = comparison, { ("!=", "=="), comparison } ;
comparison = term, { (">" | ">=" | "<" | "<="), term } ;
term = factor, { ("+" | "-"), factor } ;
factor = unary, { ("*" | "/"), unary } ;
unary = ("!" | "-"), unary
| primary ;
primary = FLOAT
| INT
| IDENTIFIER
| "true"
| "false"
| "null"
| "(", expression, ")" ;
π‘ Currently, the grammar and parser support only basic constructs such as logical, arithmetic, unary operations, literals, and parenthesized expressions.
This grammar is not left-recursive because none of the non-terminals start their production with themselves on the left side. Each rule begins with a different non-terminal or terminal before any recursion happens. For example, equality
starts with comparison
,comparison
starts with term
, etc...
Each line defines a production rule in the form:
nonterminal = definition ;
- A nonterminal (e.g.,
term
,factor
) is a named syntactic category made of other rules. - A definition consists of terminal symbols (token literals), other nonterminals, and notation operators.
Type | Example | Description |
---|---|---|
Nonterminal | term |
Named construct that expands into other rules |
Terminal | '+' , 'true' |
Fixed token literals enclosed in single quotes |
π‘ Note: Tokens like
'INT'
and'FLOAT'
are token types returned by the lexer, not literal characters.
Symbol | Meaning | Example |
---|---|---|
= |
Rule definition | `term = factor , { ('+' |
; |
End of rule | Every rule ends in a semicolon |
, |
Sequence | a , b means a followed by b |
` | ` | Alternatives |
{ ... } |
Zero or more repetitions | { a } means repeat a zero or more times |
( ... ) |
Grouping | Used to group alternatives or sequences |
Example rule:
term = factor , { ( '+' | '-' ) , factor } ;
Means:
- A
term
consists of:- A
factor
, followed by - Zero or more repetitions of:
- Either
'+'
or'-'
, and - Another
factor
- Either
- A
3
3 + 5
3 - 4 + 2
Precedence from lowest to highest is encoded in the grammar structure itself:
Precedence Level | Operators | Grammar Rule |
---|---|---|
Lowest | Equality: == , != |
equality |
Comparison: > , < , etc |
comparison |
|
Additive: + , - |
term |
|
Multiplicative: * , / |
factor |
|
Unary: - , ! |
unary |
|
Highest | Parentheses, literals | primary |
π‘ Lower-precedence rules contain (as components) higher-precedence expressions. This structure ensures operators like
*
bind more tightly than+
. For example, the expression5 * 5 + 10 + 2
is parsed as(5 * 5) + 10 + 2
.
Handles addition and subtraction:
+ , -
Example:
3 + 5 - 2
Handles multiplication and division:
* , /
Example:
4 * 2 / 8
Handles unary operations like logical not and negation, with recursive chaining:
! , -
Examples:
--5
!(-3)
Handles literals and parenthesized expressions:
(FLOAT | INT | true | false | null | '(' expression ')')
Examples:
(5 + 3)
true
Parsing order according to precedence:
- Multiplication
*
byfactor
rule - Addition
+
byterm
rule
Result:
- Multiply
2 * 3
first - Add
1 + (2 * 3)
+
/ \
1 *
/ \
2 3
Expressed as:
Binary(
Left=Literal(1),
Op='+',
Right=Binary(
Left=Literal(2),
Op='*',
Right=Literal(3)
)
)
Expression | Grammar Rule |
---|---|
1 |
primary β INT |
2 * 3 |
factor (multiplication) |
1 + (...) |
term (addition) |
These examples will not parse correctly with the current grammar:
foo + 3 # Variables/identifiers not supported yet
"abc" + "def" # String literals not supported
2 ** 3 # Exponentiation operator not supported
1 + # Trailing operator causes parse error
For example, too add new features like variables or function calls:
- Update the Lexer Add recognition of new token types such as identifiers.
- Extend the Grammar
For variables, extend the grammar to include an
IDENTIFIER
token inprimary
:
primary = 'IDENTIFIER' | ( 'FLOAT' | 'INT' | 'true' | 'false' | 'null' ) | '(' , expression , ')' ;
- Implement Semantics TODO: Add section when interpreter is implemented
go run .
Run tests for a specific package, e.g., lexer:
go test ./lexer
Run all unit tests recursively:
go test ./...
Format a particular package:
go fmt ./lexer
Format all Go files:
go fmt ./...