Dana Lexer

A standalone, layout-aware lexer for the Dana language. It recognizes tokens, inserts implicit layout tokens (e.g., AUTOEND), and reports errors precisely with line/column and a caret indicator.

Features

Accurate error messages with line/column and caret display.
Layout handling: indentation → automatic block end markers.
String, char, byte, and int literals with escape support.
Configurable debug mode for tracing the lexing process.
Stable exit codes (0 on success, non-zero on error).

Prerequisites

flex (or lex)
make
A POSIX shell (Linux / macOS)

Build

From this directory:

make            # build lexer
make clean      # remove generated .cpp/.o files
make distclean  # remove all generated files, including the lexer binary

The build produces the binary: ./lexer.

Usage

Usage: ./lexer [OPTIONS] [input_file]

Options:
  -d, --debug     Enable detailed debug output
  -h, --help      Show this help and exit

Examples:

./lexer program.dana
./lexer -d program.dana
cat program.dana | ./lexer -d
./lexer < program.dana

Options

Option	Description
`-d`, `--debug`	Enable detailed debug output from the lexer.
`-h`, `--help`	Show short usage help.

Exit Codes

0 — success
non-zero — lexing error (diagnostic printed to stderr)

Examples

Input

def hello
    writeString: "Hello World!\n"

Output (default)

token = 1007, lexeme = def
token = 1024, lexeme = hello
token = 1024, lexeme = writeString
token = 58,   lexeme = :
token = 1027, lexeme = "Hello world!\n"
token = 1,    lexeme = <AUTOEND>
token = 0,    lexeme = <EOF>

Output (`--debug`)

lexeme 'def' (len=3)
[layout] Opened 'def' block → pushed indent=0
token = 1007, lexeme = def
WS len=1
lexeme 'hello' (len=5)
token = 1024, lexeme = hello
newline
==================== Line: 2 ====================
WS len=3
[layout] Line 2: BOL whitespace len=3
[layout] Line 2: indent=4 → check layout stack
lexeme 'writeString' (len=11)
token = 1024, lexeme = writeString
lexeme ':' (len=1)
token = 58, lexeme = :
WS len=1
lexeme '"Hello world!\n"' (len=16)
token = 1027, lexeme = "Hello world!\n"
newline
==================== Line: 3 ====================
[layout] EOF: pop driver='def' (indent=0)
token = 1, lexeme = <AUTOEND>
token = 0, lexeme = <EOF>

Debug Output Explained

WS len=N: Whitespace bytes (spaces/tabs) consumed since last token. At beginning of line (after the banner), it shows indentation length.
newline: A line break was consumed. The following banner marks the start of the new line.
[layout] …: Indentation-based block handling — push, pop, or check layout stack.
Opened 'def' block: Entered a new layout block at column 0.
indent=4 → check layout stack: Current line is indented 4 spaces; compared against the stack to decide block continuation or closure.
EOF: pop driver='def': At end of file, remaining blocks are closed by emitting AUTOEND.
token = …, lexeme = …: Token code and recognized lexeme.
<AUTOEND>: Automatic layout token (similar to Python’s implicit DEDENT).
<EOF>: End of input.

Error Reporting

On illegal characters or lexing errors, the lexer prints a diagnostic:

Input

# The $ at the end of the string should trigger a lexer error
def main
    begin
        writeString: "Hello world!"$
    endd

Output

Lexer error at line 4, column 37
4:         writeString: "Hello world!"$
                                      ^
Illegal character encountered: $

The caret (^) points to the offending column.
Process exits with a non-zero code (see Exit Codes).

Testing

Use the repository’s Python test harness:

cd ../testing
python3 test_lexer.py

The script:

Runs Dana programs through lexer.
Compares printed lexemes against expected files in testing/lexer/output.
Ignores numeric token codes (lexemes only are checked).

Expected Output File Format (`hello.output`)

lexeme = def
lexeme = hello
lexeme = writeString
lexeme = :
lexeme = "Hello world!\n"
lexeme = <AUTOEND>
lexeme = <EOF>

Tips & Troubleshooting

Install Flex if missing:
```
sudo apt-get install flex
```
Ensure files are UTF-8 encoded; lexer positions are computed per byte.
Prefer LF (\n) line endings. Convert Windows endings if needed:
```
dos2unix program.dana
```
Piped input: supported on stdin. In debug mode, layout events are still shown line by line. To terminate input, press Ctrl+D twice: once to end stdin and once to flush output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dana Lexer

Features

Prerequisites

Build

Usage

Options

Exit Codes

Examples

Input

Output (default)

Output (`--debug`)

Debug Output Explained

Error Reporting

Input

Output

Testing

Expected Output File Format (`hello.output`)

Tips & Troubleshooting

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Dana Lexer

Features

Prerequisites

Build

Usage

Options

Exit Codes

Examples

Input

Output (default)

Output (--debug)

Debug Output Explained

Error Reporting

Input

Output

Testing

Expected Output File Format (hello.output)

Tips & Troubleshooting

Output (`--debug`)

Expected Output File Format (`hello.output`)