Gen Z Lang - Internals and Architecture

Version: 0.1.0 Audience: AI systems extending or modifying Gen Z Lang

Architecture Overview
Lexer Implementation
Parser Implementation
AST Design
Interpreter Implementation
Environment and Scoping
Error Handling
Extending the Language
Performance Considerations
Future Enhancements

1. Architecture Overview

1.1 Compilation Pipeline

Source Code (.genz file)
    ↓
[LEXER] - Lexical Analysis
    ↓
Token Stream
    ↓
[PARSER] - Syntax Analysis
    ↓
Abstract Syntax Tree (AST)
    ↓
[INTERPRETER] - Execution
    ↓
Output / Side Effects

1.2 Component Responsibilities

Component	File	Responsibility
Lexer	`lexer.py`	Convert source text to tokens
Parser	`parser.py`	Build AST from tokens
Interpreter	`interpreter.py`	Execute AST nodes
Environment	`environment.py`	Manage variable scopes
Tokens	`tokens.py`	Token type definitions
AST Nodes	`ast_nodes.py`	AST node class definitions
Errors	`errors.py`	Exception classes and error helpers
Built-ins	`builtins.py`	Standard library functions
CLI	`genz`	Command-line interface

1.3 Data Flow

User writes .genz file
    ↓
CLI reads file content
    ↓
Lexer.tokenize() → List[Token]
    ↓
Parser.parse() → Program (AST root)
    ↓
Interpreter.interpret() → Execution
    ↓
print() / input() / side effects

2. Lexer Implementation

2.1 Lexer Class Structure

File: genzlang/lexer.py

class Lexer:
    def __init__(self, source: str):
        self.source = source          # Source code
        self.pos = 0                  # Current position
        self.line = 1                 # Current line number
        self.column = 1               # Current column
        self.tokens = []              # Output tokens
        self.indent_stack = [0]       # Indentation levels
        self.at_line_start = True     # Line start flag

2.2 Tokenization Process

Key Methods:

tokenize() - Main entry point, returns token list
current_char() - Get character at current position
peek(offset) - Look ahead without consuming
advance() - Move to next character
skip_whitespace() - Skip spaces/tabs
handle_indentation() - Generate INDENT/DEDENT tokens
read_number() - Parse numeric literals
read_string() - Parse string literals
read_identifier_or_keyword() - Parse identifiers/keywords

2.3 Indentation Tracking

Algorithm:

At line start, count leading spaces
Compare to current indentation level (top of stack)
If increased: Push new level, emit INDENT token
If decreased: Pop levels until matching, emit DEDENT tokens
If inconsistent: Raise LexerError

Example:

vibe? x > 5:      # Indent stack: [0]
  yeet "yes"      # +4 spaces → INDENT, stack: [0, 4]
  yeet "done"     # Same level, no token
dead:             # Back to 0 → DEDENT, stack: [0]
  yeet "no"       # +2 spaces → INDENT, stack: [0, 2]

2.4 Multi-word Keywords

Handling:

Read identifier: vibe
Check if next identifier forms multi-word: check
If match found: Emit multi-word token(s)
Otherwise: Treat as single keyword

Implementation Detail:

Multi-word keywords are handled by peek_word() which looks ahead without consuming characters.

2.5 Comment Handling

Single-line (ngl):

Skip from ngl to end of line
No token emitted

Multi-line (tea: ... :tea):

Skip from tea: to closing :tea
Supports nesting (future enhancement)

3. Parser Implementation

3.1 Parser Class Structure

File: genzlang/parser.py

class Parser:
    def __init__(self, tokens: List[Token]):
        self.tokens = tokens           # Input tokens
        self.pos = 0                   # Current position
        self.in_loop = False           # Loop context flag
        self.in_function = False       # Function context flag

3.2 Recursive Descent Parsing

Strategy:

Each grammar rule maps to a parsing method:

expression → parse_expression()
statement → parse_statement()
primary → parse_primary()

Precedence Hierarchy (low to high):

parse_expression() - Entry point
parse_logical() - AND, OR
parse_comparison() - ==, !=, <, >, <=, >=
parse_term() - +, -
parse_factor() - *, /, %
parse_unary() - nah, -
parse_power() - ^
parse_primary() - Literals, identifiers, parentheses

3.3 Statement Parsing

Method: parse_statement()

Decision tree:

Current token type:
├─ RIZZ → parse_var_declaration()
├─ YEET → parse_yeet_statement()
├─ FR → parse_fr_fr_statement()
├─ VIBE_QUESTION → parse_if_statement()
├─ KEEP → parse_while_loop()
├─ GRIND → parse_for_loop()
├─ MAIN → parse_function_def()
├─ SEND → parse_return_statement()
├─ DIP → parse_break_statement()
├─ SKIP → parse_continue_statement()
├─ IDENTIFIER → parse_assignment() or parse_expression_statement()
└─ else → Error

3.4 Block Parsing

Method: parse_block()

def parse_block(self) -> List[Statement]:
    self.expect(TokenType.INDENT)
    statements = []

    while not self.match(TokenType.DEDENT, TokenType.EOF):
        self.skip_newlines()
        stmt = self.parse_statement()
        if stmt:
            statements.append(stmt)

    self.expect(TokenType.DEDENT)
    return statements

3.5 Context Tracking

Purpose: Prevent invalid constructs

Flags:

in_loop: Allows dip (break) and skip it (continue)
in_function: Allows send it (return)

Example:

def parse_return_statement(self):
    if not self.in_function:
        raise parse_error('return_outside_function', ...)
    # ... parse return

3.6 Error Recovery

Strategy:

On parse error, synchronize to statement boundary
Skip tokens until NEWLINE or keyword that starts statement
Continue parsing next statement

Not currently implemented - errors halt parsing immediately

4. AST Design

4.1 Node Hierarchy

File: genzlang/ast_nodes.py

ASTNode (base)
├─ Program
├─ Statement (base)
│  ├─ VarDeclaration
│  ├─ Assignment
│  ├─ YeetStatement
│  ├─ FrFrStatement
│  ├─ IfStatement
│  ├─ WhileLoop
│  ├─ ForLoop
│  ├─ FunctionDef
│  ├─ ReturnStatement
│  ├─ BreakStatement
│  ├─ ContinueStatement
│  └─ ExpressionStatement
└─ Expression (base)
   ├─ BinaryOp
   ├─ UnaryOp
   ├─ Comparison
   ├─ Logical
   ├─ FunctionCall
   ├─ Identifier
   ├─ Literal
   └─ VibeCheck

4.2 Node Design Pattern

Dataclass Pattern:

@dataclass
class IfStatement(Statement):
    condition: Expression
    then_block: List[Statement]
    elif_blocks: List[tuple[Expression, List[Statement]]]
    else_block: Optional[List[Statement]]

Benefits:

Immutable by default
Automatic __init__, __repr__
Type hints for clarity

4.3 Source Location Tracking

All nodes store line and column for error reporting:

@dataclass
class ASTNode:
    line: int
    column: int

5. Interpreter Implementation

5.1 Interpreter Class Structure

File: genzlang/interpreter.py

class Interpreter:
    def __init__(self):
        self.global_env = Environment()
        self.current_env = self.global_env
        # Register built-in functions

5.2 Execution Strategy

Tree-Walking Interpreter:

Recursively visit AST nodes
Execute statements for side effects
Evaluate expressions for values

Visitor Pattern (implicit):

def execute_statement(self, stmt: Statement):
    if isinstance(stmt, VarDeclaration):
        self.execute_var_declaration(stmt)
    elif isinstance(stmt, Assignment):
        self.execute_assignment(stmt)
    # ... etc

5.3 Expression Evaluation

Method: evaluate_expression(expr: Expression) -> Any

Returns:

Numbers (int, float)
Strings (str)
Booleans (bool)
Functions (Function objects)
None (from functions without return)

Example:

def evaluate_expression(self, expr):
    if isinstance(expr, Literal):
        return expr.value
    elif isinstance(expr, BinaryOp):
        return self.evaluate_binary_op(expr)
    # ... etc

5.4 Control Flow via Exceptions

Special Exceptions:

class ReturnValue(Exception):
    def __init__(self, value):
        self.value = value

class BreakException(Exception):
    pass

class ContinueException(Exception):
    pass

Usage:

# In function
if return_stmt:
    raise ReturnValue(result)

# In loop
try:
    execute_loop_body()
except BreakException:
    break
except ContinueException:
    continue

5.5 Function Calls

Process:

Lookup function in environment
Check if callable
Check argument count matches parameters
Create new environment with closure as parent
Bind parameters to arguments
Execute function body
Catch ReturnValue exception
Return value (or None)

Function Object:

class Function:
    def __init__(self, name, params, body, closure):
        self.name = name
        self.params = params        # List of parameter names
        self.body = body            # List of statements
        self.closure = closure      # Enclosing environment

6. Environment and Scoping

6.1 Environment Class

File: genzlang/environment.py

class Environment:
    def __init__(self, parent=None):
        self.parent = parent            # Enclosing scope
        self.variables = {}             # Variable storage

    def define(self, name, value):
        # Create new variable in current scope

    def get(self, name):
        # Lookup variable up scope chain

    def set(self, name, value):
        # Update existing variable

6.2 Scope Chain

Structure:

Global Environment
    ├─ Built-in functions
    ├─ Top-level variables
    └─ Parent: None

Function Environment
    ├─ Parameters
    ├─ Local variables
    └─ Parent: Closure environment

Nested Function Environment
    ├─ Parameters
    ├─ Local variables
    └─ Parent: Outer function environment

6.3 Variable Lookup

Algorithm:

Check current environment's variables dict
If not found and parent exists, recursively check parent
If not found at global level, raise RuntimeError

Example:

rizz global_var = 100

main character outer():
  rizz outer_var = 50

  main character inner():
    rizz inner_var = 25
    yeet global_var   # Lookup: inner → outer → global ✓

6.4 Closures

Implementation:

When function is defined, capture current environment:

def execute_function_def(self, stmt: FunctionDef):
    func = Function(
        name=stmt.name,
        params=stmt.params,
        body=stmt.body,
        closure=self.current_env  # Capture closure
    )
    self.current_env.define(stmt.name, func)

When function is called, use closure as parent:

def evaluate_function_call(self, expr: FunctionCall):
    func = self.current_env.get(expr.name)
    func_env = Environment(parent=func.closure)  # Use closure
    # ... bind parameters and execute

7. Error Handling

7.1 Error Hierarchy

File: genzlang/errors.py

GenZError (base)
├─ LexerError
├─ ParseError
├─ RuntimeError
└─ TypeError

7.2 Error Context

All errors include:

Message (with Gen Z slang)
Line number (optional)
Column number (optional)

Example:

raise runtime_error(
    'undefined_variable',
    line=expr.line,
    column=expr.column,
    name=variable_name
)

7.3 Error Message Templates

File: genzlang/errors.py

ERROR_MESSAGES = {
    'undefined_variable': "No cap, variable '{name}' doesn't exist fam",
    'division_by_zero': "That's not the vibe - can't divide by zero",
    # ... etc
}

7.4 Error Propagation

Current behavior:

Error raised at point of failure
Propagates up call stack
Caught by CLI (genz script)
Error message printed
Program exits with code 1

No try-catch mechanism - errors are fatal

8. Extending the Language

8.1 Adding New Keywords

Steps:

Add token type in tokens.py:

class TokenType(Enum):
    NEW_KEYWORD = auto()

Add to keyword mapping:

KEYWORDS = {
    'new_keyword': TokenType.NEW_KEYWORD,
}

Handle in parser (parser.py):

def parse_statement(self):
    if self.match(TokenType.NEW_KEYWORD):
        return self.parse_new_statement()

Create AST node (ast_nodes.py):

@dataclass
class NewStatement(Statement):
    # ... fields

Implement execution (interpreter.py):

def execute_statement(self, stmt):
    if isinstance(stmt, NewStatement):
        self.execute_new_statement(stmt)

8.2 Adding Built-in Functions

File: genzlang/builtins.py

def my_new_function(arg1, arg2):
    # Implementation
    return result

BUILTINS = {
    'my_new_function': my_new_function,
    # ... existing functions
}

Example: Add round() function:

def round_number(x, digits=0):
    """Round number to given decimal places"""
    try:
        return round(float(x), int(digits))
    except (ValueError, TypeError):
        raise ValueError(f"Can't round {x}")

BUILTINS = {
    'round_number': round_number,
    # ... etc
}

8.3 Adding New Operators

Steps:

Add token (tokens.py):
```
FLOOR_DIVIDE = auto()  # //
```

Recognize in lexer (lexer.py):

if char == '/' and self.peek(1) == '/':
    self.tokens.append(Token(TokenType.FLOOR_DIVIDE, '//', ...))
    self.advance()
    self.advance()

Parse in expression (parser.py):

def parse_factor(self):
    left = self.parse_unary()
    while self.match(TokenType.FLOOR_DIVIDE, ...):
        op = self.advance()
        right = self.parse_unary()
        left = BinaryOp(left, op.value, right, ...)

Evaluate (interpreter.py):

def evaluate_binary_op(self, expr):
    if expr.operator == '//':
        return int(left // right)

8.4 Adding Data Types

Example: Add List Type

Create list literal syntax:
```
rizz numbers = squad(1, 2, 3, 4, 5)
```
Parse as function call:
- Already supported! squad(...) is a function call

Implement squad() built-in:

def squad(*args):
    """Create a list"""
    return list(args)

BUILTINS['squad'] = squad

Add list methods:

def add_to_squad(lst, item):
    """Append to list"""
    if not isinstance(lst, list):
        raise TypeError("First arg must be a list")
    lst.append(item)
    return lst

def get_from_squad(lst, index):
    """Get list item"""
    return lst[index]

8.5 Adding Language Constructs

Example: Add Switch/Match Statement

Design syntax:

vibe check x:
  case 1:
    yeet "one"
  case 2:
    yeet "two"
  default:
    yeet "other"

Add tokens:
```
CASE = auto()
DEFAULT = auto()
```

Create AST node:

@dataclass
class SwitchStatement(Statement):
    value: Expression
    cases: List[tuple[Expression, List[Statement]]]
    default: Optional[List[Statement]]

Parse:

def parse_switch_statement(self):
    self.expect(TokenType.VIBE)
    self.expect(TokenType.CHECK)
    value = self.parse_expression()
    self.expect(TokenType.COLON)
    # ... parse cases

Interpret:

def execute_switch(self, stmt):
    value = self.evaluate_expression(stmt.value)
    for case_value, case_body in stmt.cases:
        if self.evaluate_expression(case_value) == value:
            for s in case_body:
                self.execute_statement(s)
            return
    # Execute default

9. Performance Considerations

9.1 Current Limitations

Slow operations:

Recursive function calls (no tail call optimization)
String concatenation in loops
Deep variable lookups in nested scopes
No caching of parsed AST

9.2 Optimization Opportunities

1. Bytecode Compilation:

Instead of interpreting AST directly, compile to bytecode:

Source → AST → Bytecode → VM Execution

Benefits:

Faster execution
Smaller memory footprint
JIT compilation possible

2. Constant Folding:

Evaluate constant expressions at parse time:

2 + 3 * 4  →  14  # Computed at parse time

3. Variable Resolution Caching:

Cache variable lookup depths:

{
    'global_var': 0,   # 0 scopes up
    'outer_var': 1,     # 1 scope up
    'local_var': None   # Current scope
}

4. String Interning:

Reuse string objects:

# Instead of creating new strings
string_pool = {}  # Map string → interned object

9.3 Memory Management

Current:

Python's garbage collection
No explicit memory management
Closures keep references to parent scopes

Future:

Reference counting for early cleanup
Weak references for circular dependencies
Memory limits and quotas

10. Future Enhancements

10.1 Planned Features

1. Native Data Structures:

ngl Lists
rizz numbers = [1, 2, 3, 4, 5]
rizz first = numbers[0]
numbers[2] = 10

ngl Dictionaries
rizz person = {"name": "Alice", "age": 25}
rizz name = person["name"]

Implementation:

Add [ and ] tokens
Parse list/dict literals
Add indexing expressions
Implement indexing operations

2. String Interpolation:

rizz name = "Alice"
yeet "Hello, {name}!"  # "Hello, Alice!"

Implementation:

Parse string literals for {expr} patterns
Create string concatenation AST
Evaluate interpolated expressions

3. Exception Handling:

catch it:
  rizz result = risky_operation()
oops Error as e:
  yeet "Error occurred: " + e
slay:
  yeet "Cleanup code"

Implementation:

Add catch, oops, slay keywords
Create TryStatement AST node
Implement exception catching in interpreter

4. Modules and Imports:

cop "math_utils"  # Import module

rizz result = math_utils.calculate(10)

Implementation:

Add cop keyword (import)
Implement module loader
Module search path
Namespace isolation

5. Classes and OOP:

main character Person:
  main character vibe_check(name, age):
    it.name = name
    it.age = age

  main character greet():
    yeet "Hello, I'm " + it.name

rizz person = Person("Alice", 25)
person.greet()

Implementation:

Add it keyword (self/this)
Parse class definitions
Implement object system
Method binding

6. Decorators:

@enhance
main character my_function():
  yeet "Hello"

7. Lambda Expressions:

rizz double = lambda(x) -> x * 2
yeet double(5)  # 10

8. Comprehensions:

rizz squares = [x * x for x in range(10)]

9. Pattern Matching:

vibe check value:
  case [1, 2, x]:
    yeet "Got 1, 2, and " + x
  case {"name": n}:
    yeet "Name is " + n

10. Async/Await:

async main character fetch_data():
  rizz result = await http_get("url")
  send it result

10.2 Tooling Enhancements

1. REPL (Read-Eval-Print Loop):

$ genz
Gen Z Lang v0.1.0
>>> rizz x = 10
>>> yeet x
10

2. Debugger:

$ genz --debug program.genz
(genz-db) break 10
(genz-db) run
(genz-db) print x

3. Profiler:

$ genz --profile program.genz
Function calls: 1000
Total time: 0.5s
Hotspots:
  - fibonacci: 0.3s (60%)
  - loop: 0.2s (40%)

4. Language Server (LSP):

Syntax highlighting
Autocomplete
Jump to definition
Error checking
Refactoring

5. Package Manager:

$ genz-pkg install http-client
$ genz-pkg publish my-library

10.3 Standard Library Expansion

Planned modules:

math - Mathematical functions
string - String operations
array - Array/list utilities
file - File I/O
http - HTTP client/server
json - JSON parsing
regex - Regular expressions
datetime - Date/time operations
random - Random number generation
crypto - Cryptography
testing - Unit testing framework

11. Implementation Guidelines

11.1 Code Style

Python code:

Follow PEP 8
Type hints for function signatures
Docstrings for public APIs
Clear variable names

Gen Z Lang code:

Snake_case for identifiers
4 spaces for indentation
Comments explain why, not what
Keep functions focused

11.2 Testing

Unit tests:

def test_lexer_numbers():
    lexer = Lexer("42 3.14")
    tokens = lexer.tokenize()
    assert tokens[0].type == TokenType.NUMBER
    assert tokens[0].value == 42
    assert tokens[1].type == TokenType.NUMBER
    assert tokens[1].value == 3.14

Integration tests:

def test_full_program():
    source = """
rizz x = 10
yeet x
"""
    lexer = Lexer(source)
    tokens = lexer.tokenize()
    parser = Parser(tokens)
    ast = parser.parse()
    interpreter = Interpreter()
    interpreter.interpret(ast)
    # Check output

11.3 Documentation

For each feature:

Language spec update
API documentation
Example programs
Migration guide (if breaking change)

11.4 Backward Compatibility

Guidelines:

Don't break existing code unless absolutely necessary
Deprecate features before removing
Provide migration path
Version the language spec

12. Advanced Topics

12.1 Implementing Optimizations

Constant Propagation:

def optimize_constant_propagation(node):
    if isinstance(node, BinaryOp):
        left = optimize(node.left)
        right = optimize(node.right)

        if isinstance(left, Literal) and isinstance(right, Literal):
            # Evaluate at compile time
            result = evaluate_binary(left.value, node.operator, right.value)
            return Literal(value=result, line=node.line, column=node.column)

    return node

Dead Code Elimination:

def optimize_dead_code(statements):
    result = []
    reachable = True

    for stmt in statements:
        if not reachable:
            break  # Skip unreachable code

        result.append(stmt)

        if isinstance(stmt, ReturnStatement):
            reachable = False

    return result

12.2 Static Analysis

Type Inference:

def infer_type(expr, env):
    if isinstance(expr, Literal):
        return type(expr.value)
    elif isinstance(expr, BinaryOp):
        left_type = infer_type(expr.left, env)
        right_type = infer_type(expr.right, env)
        # Determine result type
    # ... etc

Linting:

def lint_unused_variables(ast):
    declared = set()
    used = set()

    # Traverse AST
    for node in walk(ast):
        if isinstance(node, VarDeclaration):
            declared.add(node.name)
        elif isinstance(node, Identifier):
            used.add(node.name)

    unused = declared - used
    for var in unused:
        warn(f"Variable '{var}' declared but never used")

12.3 Debugging Support

Stack Trace:

class CallFrame:
    def __init__(self, function_name, line):
        self.function_name = function_name
        self.line = line

class Interpreter:
    def __init__(self):
        self.call_stack = []

    def execute_function_call(self, expr):
        self.call_stack.append(CallFrame(expr.name, expr.line))
        try:
            # Execute function
            pass
        finally:
            self.call_stack.pop()

    def print_stack_trace(self):
        for frame in reversed(self.call_stack):
            print(f"  at {frame.function_name} (line {frame.line})")

Conclusion

This internals guide provides complete documentation for understanding and extending Gen Z Lang:

Architecture: Clear separation of lexer, parser, interpreter
Implementation: Tree-walking interpreter with environment-based scoping
Extensibility: Well-defined extension points for new features
Performance: Opportunities for optimization identified
Future: Roadmap for language evolution

For AI systems working with Gen Z Lang:

Follow the architecture patterns established
Maintain backward compatibility when extending
Document new features thoroughly
Write tests for all new functionality
Consider performance implications

The language is designed to be simple yet extensible, making it an ideal platform for experimentation and learning about language implementation.

FilesExpand file tree

INTERNALS.md

Latest commit

History

INTERNALS.md

File metadata and controls

Gen Z Lang - Internals and Architecture

Table of Contents

1. Architecture Overview

1.1 Compilation Pipeline

1.2 Component Responsibilities

1.3 Data Flow

2. Lexer Implementation

2.1 Lexer Class Structure

2.2 Tokenization Process

2.3 Indentation Tracking

2.4 Multi-word Keywords

2.5 Comment Handling

3. Parser Implementation

3.1 Parser Class Structure

3.2 Recursive Descent Parsing

3.3 Statement Parsing

3.4 Block Parsing

3.5 Context Tracking

3.6 Error Recovery

4. AST Design

4.1 Node Hierarchy

4.2 Node Design Pattern

4.3 Source Location Tracking

5. Interpreter Implementation

5.1 Interpreter Class Structure

5.2 Execution Strategy

5.3 Expression Evaluation

5.4 Control Flow via Exceptions

5.5 Function Calls

6. Environment and Scoping

6.1 Environment Class

6.2 Scope Chain

6.3 Variable Lookup

6.4 Closures

7. Error Handling

7.1 Error Hierarchy

7.2 Error Context

7.3 Error Message Templates

7.4 Error Propagation

8. Extending the Language

8.1 Adding New Keywords

8.2 Adding Built-in Functions

8.3 Adding New Operators

8.4 Adding Data Types

8.5 Adding Language Constructs

9. Performance Considerations

9.1 Current Limitations

9.2 Optimization Opportunities

9.3 Memory Management

10. Future Enhancements

10.1 Planned Features

10.2 Tooling Enhancements

10.3 Standard Library Expansion

11. Implementation Guidelines

11.1 Code Style

11.2 Testing

11.3 Documentation

11.4 Backward Compatibility

12. Advanced Topics

12.1 Implementing Optimizations

12.2 Static Analysis

12.3 Debugging Support

Conclusion