Skip to content

Latest commit

 

History

History
1228 lines (928 loc) · 25 KB

File metadata and controls

1228 lines (928 loc) · 25 KB

Gen Z Lang - Internals and Architecture

Version: 0.1.0 Audience: AI systems extending or modifying Gen Z Lang

Table of Contents

  1. Architecture Overview
  2. Lexer Implementation
  3. Parser Implementation
  4. AST Design
  5. Interpreter Implementation
  6. Environment and Scoping
  7. Error Handling
  8. Extending the Language
  9. Performance Considerations
  10. Future Enhancements

1. Architecture Overview

1.1 Compilation Pipeline

Source Code (.genz file)
    ↓
[LEXER] - Lexical Analysis
    ↓
Token Stream
    ↓
[PARSER] - Syntax Analysis
    ↓
Abstract Syntax Tree (AST)
    ↓
[INTERPRETER] - Execution
    ↓
Output / Side Effects

1.2 Component Responsibilities

Component File Responsibility
Lexer lexer.py Convert source text to tokens
Parser parser.py Build AST from tokens
Interpreter interpreter.py Execute AST nodes
Environment environment.py Manage variable scopes
Tokens tokens.py Token type definitions
AST Nodes ast_nodes.py AST node class definitions
Errors errors.py Exception classes and error helpers
Built-ins builtins.py Standard library functions
CLI genz Command-line interface

1.3 Data Flow

User writes .genz file
    ↓
CLI reads file content
    ↓
Lexer.tokenize() → List[Token]
    ↓
Parser.parse() → Program (AST root)
    ↓
Interpreter.interpret() → Execution
    ↓
print() / input() / side effects

2. Lexer Implementation

2.1 Lexer Class Structure

File: genzlang/lexer.py

class Lexer:
    def __init__(self, source: str):
        self.source = source          # Source code
        self.pos = 0                  # Current position
        self.line = 1                 # Current line number
        self.column = 1               # Current column
        self.tokens = []              # Output tokens
        self.indent_stack = [0]       # Indentation levels
        self.at_line_start = True     # Line start flag

2.2 Tokenization Process

Key Methods:

  • tokenize() - Main entry point, returns token list
  • current_char() - Get character at current position
  • peek(offset) - Look ahead without consuming
  • advance() - Move to next character
  • skip_whitespace() - Skip spaces/tabs
  • handle_indentation() - Generate INDENT/DEDENT tokens
  • read_number() - Parse numeric literals
  • read_string() - Parse string literals
  • read_identifier_or_keyword() - Parse identifiers/keywords

2.3 Indentation Tracking

Algorithm:

  1. At line start, count leading spaces
  2. Compare to current indentation level (top of stack)
  3. If increased: Push new level, emit INDENT token
  4. If decreased: Pop levels until matching, emit DEDENT tokens
  5. If inconsistent: Raise LexerError

Example:

vibe? x > 5:      # Indent stack: [0]
  yeet "yes"      # +4 spaces → INDENT, stack: [0, 4]
  yeet "done"     # Same level, no token
dead:             # Back to 0 → DEDENT, stack: [0]
  yeet "no"       # +2 spaces → INDENT, stack: [0, 2]

2.4 Multi-word Keywords

Handling:

  1. Read identifier: vibe
  2. Check if next identifier forms multi-word: check
  3. If match found: Emit multi-word token(s)
  4. Otherwise: Treat as single keyword

Implementation Detail:

Multi-word keywords are handled by peek_word() which looks ahead without consuming characters.

2.5 Comment Handling

Single-line (ngl):

  • Skip from ngl to end of line
  • No token emitted

Multi-line (tea: ... :tea):

  • Skip from tea: to closing :tea
  • Supports nesting (future enhancement)

3. Parser Implementation

3.1 Parser Class Structure

File: genzlang/parser.py

class Parser:
    def __init__(self, tokens: List[Token]):
        self.tokens = tokens           # Input tokens
        self.pos = 0                   # Current position
        self.in_loop = False           # Loop context flag
        self.in_function = False       # Function context flag

3.2 Recursive Descent Parsing

Strategy:

Each grammar rule maps to a parsing method:

expression → parse_expression()
statement → parse_statement()
primary → parse_primary()

Precedence Hierarchy (low to high):

  1. parse_expression() - Entry point
  2. parse_logical() - AND, OR
  3. parse_comparison() - ==, !=, <, >, <=, >=
  4. parse_term() - +, -
  5. parse_factor() - *, /, %
  6. parse_unary() - nah, -
  7. parse_power() - ^
  8. parse_primary() - Literals, identifiers, parentheses

3.3 Statement Parsing

Method: parse_statement()

Decision tree:

Current token type:
├─ RIZZ → parse_var_declaration()
├─ YEET → parse_yeet_statement()
├─ FR → parse_fr_fr_statement()
├─ VIBE_QUESTION → parse_if_statement()
├─ KEEP → parse_while_loop()
├─ GRIND → parse_for_loop()
├─ MAIN → parse_function_def()
├─ SEND → parse_return_statement()
├─ DIP → parse_break_statement()
├─ SKIP → parse_continue_statement()
├─ IDENTIFIER → parse_assignment() or parse_expression_statement()
└─ else → Error

3.4 Block Parsing

Method: parse_block()

def parse_block(self) -> List[Statement]:
    self.expect(TokenType.INDENT)
    statements = []

    while not self.match(TokenType.DEDENT, TokenType.EOF):
        self.skip_newlines()
        stmt = self.parse_statement()
        if stmt:
            statements.append(stmt)

    self.expect(TokenType.DEDENT)
    return statements

3.5 Context Tracking

Purpose: Prevent invalid constructs

Flags:

  • in_loop: Allows dip (break) and skip it (continue)
  • in_function: Allows send it (return)

Example:

def parse_return_statement(self):
    if not self.in_function:
        raise parse_error('return_outside_function', ...)
    # ... parse return

3.6 Error Recovery

Strategy:

  1. On parse error, synchronize to statement boundary
  2. Skip tokens until NEWLINE or keyword that starts statement
  3. Continue parsing next statement

Not currently implemented - errors halt parsing immediately


4. AST Design

4.1 Node Hierarchy

File: genzlang/ast_nodes.py

ASTNode (base)
├─ Program
├─ Statement (base)
│  ├─ VarDeclaration
│  ├─ Assignment
│  ├─ YeetStatement
│  ├─ FrFrStatement
│  ├─ IfStatement
│  ├─ WhileLoop
│  ├─ ForLoop
│  ├─ FunctionDef
│  ├─ ReturnStatement
│  ├─ BreakStatement
│  ├─ ContinueStatement
│  └─ ExpressionStatement
└─ Expression (base)
   ├─ BinaryOp
   ├─ UnaryOp
   ├─ Comparison
   ├─ Logical
   ├─ FunctionCall
   ├─ Identifier
   ├─ Literal
   └─ VibeCheck

4.2 Node Design Pattern

Dataclass Pattern:

@dataclass
class IfStatement(Statement):
    condition: Expression
    then_block: List[Statement]
    elif_blocks: List[tuple[Expression, List[Statement]]]
    else_block: Optional[List[Statement]]

Benefits:

  • Immutable by default
  • Automatic __init__, __repr__
  • Type hints for clarity

4.3 Source Location Tracking

All nodes store line and column for error reporting:

@dataclass
class ASTNode:
    line: int
    column: int

5. Interpreter Implementation

5.1 Interpreter Class Structure

File: genzlang/interpreter.py

class Interpreter:
    def __init__(self):
        self.global_env = Environment()
        self.current_env = self.global_env
        # Register built-in functions

5.2 Execution Strategy

Tree-Walking Interpreter:

  • Recursively visit AST nodes
  • Execute statements for side effects
  • Evaluate expressions for values

Visitor Pattern (implicit):

def execute_statement(self, stmt: Statement):
    if isinstance(stmt, VarDeclaration):
        self.execute_var_declaration(stmt)
    elif isinstance(stmt, Assignment):
        self.execute_assignment(stmt)
    # ... etc

5.3 Expression Evaluation

Method: evaluate_expression(expr: Expression) -> Any

Returns:

  • Numbers (int, float)
  • Strings (str)
  • Booleans (bool)
  • Functions (Function objects)
  • None (from functions without return)

Example:

def evaluate_expression(self, expr):
    if isinstance(expr, Literal):
        return expr.value
    elif isinstance(expr, BinaryOp):
        return self.evaluate_binary_op(expr)
    # ... etc

5.4 Control Flow via Exceptions

Special Exceptions:

class ReturnValue(Exception):
    def __init__(self, value):
        self.value = value

class BreakException(Exception):
    pass

class ContinueException(Exception):
    pass

Usage:

# In function
if return_stmt:
    raise ReturnValue(result)

# In loop
try:
    execute_loop_body()
except BreakException:
    break
except ContinueException:
    continue

5.5 Function Calls

Process:

  1. Lookup function in environment
  2. Check if callable
  3. Check argument count matches parameters
  4. Create new environment with closure as parent
  5. Bind parameters to arguments
  6. Execute function body
  7. Catch ReturnValue exception
  8. Return value (or None)

Function Object:

class Function:
    def __init__(self, name, params, body, closure):
        self.name = name
        self.params = params        # List of parameter names
        self.body = body            # List of statements
        self.closure = closure      # Enclosing environment

6. Environment and Scoping

6.1 Environment Class

File: genzlang/environment.py

class Environment:
    def __init__(self, parent=None):
        self.parent = parent            # Enclosing scope
        self.variables = {}             # Variable storage

    def define(self, name, value):
        # Create new variable in current scope

    def get(self, name):
        # Lookup variable up scope chain

    def set(self, name, value):
        # Update existing variable

6.2 Scope Chain

Structure:

Global Environment
    ├─ Built-in functions
    ├─ Top-level variables
    └─ Parent: None

Function Environment
    ├─ Parameters
    ├─ Local variables
    └─ Parent: Closure environment

Nested Function Environment
    ├─ Parameters
    ├─ Local variables
    └─ Parent: Outer function environment

6.3 Variable Lookup

Algorithm:

  1. Check current environment's variables dict
  2. If not found and parent exists, recursively check parent
  3. If not found at global level, raise RuntimeError

Example:

rizz global_var = 100

main character outer():
  rizz outer_var = 50

  main character inner():
    rizz inner_var = 25
    yeet global_var   # Lookup: inner → outer → global ✓

6.4 Closures

Implementation:

When function is defined, capture current environment:

def execute_function_def(self, stmt: FunctionDef):
    func = Function(
        name=stmt.name,
        params=stmt.params,
        body=stmt.body,
        closure=self.current_env  # Capture closure
    )
    self.current_env.define(stmt.name, func)

When function is called, use closure as parent:

def evaluate_function_call(self, expr: FunctionCall):
    func = self.current_env.get(expr.name)
    func_env = Environment(parent=func.closure)  # Use closure
    # ... bind parameters and execute

7. Error Handling

7.1 Error Hierarchy

File: genzlang/errors.py

GenZError (base)
├─ LexerError
├─ ParseError
├─ RuntimeError
└─ TypeError

7.2 Error Context

All errors include:

  • Message (with Gen Z slang)
  • Line number (optional)
  • Column number (optional)

Example:

raise runtime_error(
    'undefined_variable',
    line=expr.line,
    column=expr.column,
    name=variable_name
)

7.3 Error Message Templates

File: genzlang/errors.py

ERROR_MESSAGES = {
    'undefined_variable': "No cap, variable '{name}' doesn't exist fam",
    'division_by_zero': "That's not the vibe - can't divide by zero",
    # ... etc
}

7.4 Error Propagation

Current behavior:

  1. Error raised at point of failure
  2. Propagates up call stack
  3. Caught by CLI (genz script)
  4. Error message printed
  5. Program exits with code 1

No try-catch mechanism - errors are fatal


8. Extending the Language

8.1 Adding New Keywords

Steps:

  1. Add token type in tokens.py:

    class TokenType(Enum):
        NEW_KEYWORD = auto()
  2. Add to keyword mapping:

    KEYWORDS = {
        'new_keyword': TokenType.NEW_KEYWORD,
    }
  3. Handle in parser (parser.py):

    def parse_statement(self):
        if self.match(TokenType.NEW_KEYWORD):
            return self.parse_new_statement()
  4. Create AST node (ast_nodes.py):

    @dataclass
    class NewStatement(Statement):
        # ... fields
  5. Implement execution (interpreter.py):

    def execute_statement(self, stmt):
        if isinstance(stmt, NewStatement):
            self.execute_new_statement(stmt)

8.2 Adding Built-in Functions

File: genzlang/builtins.py

def my_new_function(arg1, arg2):
    # Implementation
    return result

BUILTINS = {
    'my_new_function': my_new_function,
    # ... existing functions
}

Example: Add round() function:

def round_number(x, digits=0):
    """Round number to given decimal places"""
    try:
        return round(float(x), int(digits))
    except (ValueError, TypeError):
        raise ValueError(f"Can't round {x}")

BUILTINS = {
    'round_number': round_number,
    # ... etc
}

8.3 Adding New Operators

Steps:

  1. Add token (tokens.py):

    FLOOR_DIVIDE = auto()  # //
  2. Recognize in lexer (lexer.py):

    if char == '/' and self.peek(1) == '/':
        self.tokens.append(Token(TokenType.FLOOR_DIVIDE, '//', ...))
        self.advance()
        self.advance()
  3. Parse in expression (parser.py):

    def parse_factor(self):
        left = self.parse_unary()
        while self.match(TokenType.FLOOR_DIVIDE, ...):
            op = self.advance()
            right = self.parse_unary()
            left = BinaryOp(left, op.value, right, ...)
  4. Evaluate (interpreter.py):

    def evaluate_binary_op(self, expr):
        if expr.operator == '//':
            return int(left // right)

8.4 Adding Data Types

Example: Add List Type

  1. Create list literal syntax:

    rizz numbers = squad(1, 2, 3, 4, 5)
    
  2. Parse as function call:

    • Already supported! squad(...) is a function call
  3. Implement squad() built-in:

    def squad(*args):
        """Create a list"""
        return list(args)
    
    BUILTINS['squad'] = squad
  4. Add list methods:

    def add_to_squad(lst, item):
        """Append to list"""
        if not isinstance(lst, list):
            raise TypeError("First arg must be a list")
        lst.append(item)
        return lst
    
    def get_from_squad(lst, index):
        """Get list item"""
        return lst[index]

8.5 Adding Language Constructs

Example: Add Switch/Match Statement

  1. Design syntax:

    vibe check x:
      case 1:
        yeet "one"
      case 2:
        yeet "two"
      default:
        yeet "other"
    
  2. Add tokens:

    CASE = auto()
    DEFAULT = auto()
  3. Create AST node:

    @dataclass
    class SwitchStatement(Statement):
        value: Expression
        cases: List[tuple[Expression, List[Statement]]]
        default: Optional[List[Statement]]
  4. Parse:

    def parse_switch_statement(self):
        self.expect(TokenType.VIBE)
        self.expect(TokenType.CHECK)
        value = self.parse_expression()
        self.expect(TokenType.COLON)
        # ... parse cases
  5. Interpret:

    def execute_switch(self, stmt):
        value = self.evaluate_expression(stmt.value)
        for case_value, case_body in stmt.cases:
            if self.evaluate_expression(case_value) == value:
                for s in case_body:
                    self.execute_statement(s)
                return
        # Execute default

9. Performance Considerations

9.1 Current Limitations

Slow operations:

  • Recursive function calls (no tail call optimization)
  • String concatenation in loops
  • Deep variable lookups in nested scopes
  • No caching of parsed AST

9.2 Optimization Opportunities

1. Bytecode Compilation:

Instead of interpreting AST directly, compile to bytecode:

Source → AST → Bytecode → VM Execution

Benefits:

  • Faster execution
  • Smaller memory footprint
  • JIT compilation possible

2. Constant Folding:

Evaluate constant expressions at parse time:

2 + 3 * 414  # Computed at parse time

3. Variable Resolution Caching:

Cache variable lookup depths:

{
    'global_var': 0,   # 0 scopes up
    'outer_var': 1,     # 1 scope up
    'local_var': None   # Current scope
}

4. String Interning:

Reuse string objects:

# Instead of creating new strings
string_pool = {}  # Map string → interned object

9.3 Memory Management

Current:

  • Python's garbage collection
  • No explicit memory management
  • Closures keep references to parent scopes

Future:

  • Reference counting for early cleanup
  • Weak references for circular dependencies
  • Memory limits and quotas

10. Future Enhancements

10.1 Planned Features

1. Native Data Structures:

ngl Lists
rizz numbers = [1, 2, 3, 4, 5]
rizz first = numbers[0]
numbers[2] = 10

ngl Dictionaries
rizz person = {"name": "Alice", "age": 25}
rizz name = person["name"]

Implementation:

  • Add [ and ] tokens
  • Parse list/dict literals
  • Add indexing expressions
  • Implement indexing operations

2. String Interpolation:

rizz name = "Alice"
yeet "Hello, {name}!"  # "Hello, Alice!"

Implementation:

  • Parse string literals for {expr} patterns
  • Create string concatenation AST
  • Evaluate interpolated expressions

3. Exception Handling:

catch it:
  rizz result = risky_operation()
oops Error as e:
  yeet "Error occurred: " + e
slay:
  yeet "Cleanup code"

Implementation:

  • Add catch, oops, slay keywords
  • Create TryStatement AST node
  • Implement exception catching in interpreter

4. Modules and Imports:

cop "math_utils"  # Import module

rizz result = math_utils.calculate(10)

Implementation:

  • Add cop keyword (import)
  • Implement module loader
  • Module search path
  • Namespace isolation

5. Classes and OOP:

main character Person:
  main character vibe_check(name, age):
    it.name = name
    it.age = age

  main character greet():
    yeet "Hello, I'm " + it.name

rizz person = Person("Alice", 25)
person.greet()

Implementation:

  • Add it keyword (self/this)
  • Parse class definitions
  • Implement object system
  • Method binding

6. Decorators:

@enhance
main character my_function():
  yeet "Hello"

7. Lambda Expressions:

rizz double = lambda(x) -> x * 2
yeet double(5)  # 10

8. Comprehensions:

rizz squares = [x * x for x in range(10)]

9. Pattern Matching:

vibe check value:
  case [1, 2, x]:
    yeet "Got 1, 2, and " + x
  case {"name": n}:
    yeet "Name is " + n

10. Async/Await:

async main character fetch_data():
  rizz result = await http_get("url")
  send it result

10.2 Tooling Enhancements

1. REPL (Read-Eval-Print Loop):

$ genz
Gen Z Lang v0.1.0
>>> rizz x = 10
>>> yeet x
10

2. Debugger:

$ genz --debug program.genz
(genz-db) break 10
(genz-db) run
(genz-db) print x

3. Profiler:

$ genz --profile program.genz
Function calls: 1000
Total time: 0.5s
Hotspots:
  - fibonacci: 0.3s (60%)
  - loop: 0.2s (40%)

4. Language Server (LSP):

  • Syntax highlighting
  • Autocomplete
  • Jump to definition
  • Error checking
  • Refactoring

5. Package Manager:

$ genz-pkg install http-client
$ genz-pkg publish my-library

10.3 Standard Library Expansion

Planned modules:

  • math - Mathematical functions
  • string - String operations
  • array - Array/list utilities
  • file - File I/O
  • http - HTTP client/server
  • json - JSON parsing
  • regex - Regular expressions
  • datetime - Date/time operations
  • random - Random number generation
  • crypto - Cryptography
  • testing - Unit testing framework

11. Implementation Guidelines

11.1 Code Style

Python code:

  • Follow PEP 8
  • Type hints for function signatures
  • Docstrings for public APIs
  • Clear variable names

Gen Z Lang code:

  • Snake_case for identifiers
  • 4 spaces for indentation
  • Comments explain why, not what
  • Keep functions focused

11.2 Testing

Unit tests:

def test_lexer_numbers():
    lexer = Lexer("42 3.14")
    tokens = lexer.tokenize()
    assert tokens[0].type == TokenType.NUMBER
    assert tokens[0].value == 42
    assert tokens[1].type == TokenType.NUMBER
    assert tokens[1].value == 3.14

Integration tests:

def test_full_program():
    source = """
rizz x = 10
yeet x
"""
    lexer = Lexer(source)
    tokens = lexer.tokenize()
    parser = Parser(tokens)
    ast = parser.parse()
    interpreter = Interpreter()
    interpreter.interpret(ast)
    # Check output

11.3 Documentation

For each feature:

  1. Language spec update
  2. API documentation
  3. Example programs
  4. Migration guide (if breaking change)

11.4 Backward Compatibility

Guidelines:

  • Don't break existing code unless absolutely necessary
  • Deprecate features before removing
  • Provide migration path
  • Version the language spec

12. Advanced Topics

12.1 Implementing Optimizations

Constant Propagation:

def optimize_constant_propagation(node):
    if isinstance(node, BinaryOp):
        left = optimize(node.left)
        right = optimize(node.right)

        if isinstance(left, Literal) and isinstance(right, Literal):
            # Evaluate at compile time
            result = evaluate_binary(left.value, node.operator, right.value)
            return Literal(value=result, line=node.line, column=node.column)

    return node

Dead Code Elimination:

def optimize_dead_code(statements):
    result = []
    reachable = True

    for stmt in statements:
        if not reachable:
            break  # Skip unreachable code

        result.append(stmt)

        if isinstance(stmt, ReturnStatement):
            reachable = False

    return result

12.2 Static Analysis

Type Inference:

def infer_type(expr, env):
    if isinstance(expr, Literal):
        return type(expr.value)
    elif isinstance(expr, BinaryOp):
        left_type = infer_type(expr.left, env)
        right_type = infer_type(expr.right, env)
        # Determine result type
    # ... etc

Linting:

def lint_unused_variables(ast):
    declared = set()
    used = set()

    # Traverse AST
    for node in walk(ast):
        if isinstance(node, VarDeclaration):
            declared.add(node.name)
        elif isinstance(node, Identifier):
            used.add(node.name)

    unused = declared - used
    for var in unused:
        warn(f"Variable '{var}' declared but never used")

12.3 Debugging Support

Stack Trace:

class CallFrame:
    def __init__(self, function_name, line):
        self.function_name = function_name
        self.line = line

class Interpreter:
    def __init__(self):
        self.call_stack = []

    def execute_function_call(self, expr):
        self.call_stack.append(CallFrame(expr.name, expr.line))
        try:
            # Execute function
            pass
        finally:
            self.call_stack.pop()

    def print_stack_trace(self):
        for frame in reversed(self.call_stack):
            print(f"  at {frame.function_name} (line {frame.line})")

Conclusion

This internals guide provides complete documentation for understanding and extending Gen Z Lang:

  1. Architecture: Clear separation of lexer, parser, interpreter
  2. Implementation: Tree-walking interpreter with environment-based scoping
  3. Extensibility: Well-defined extension points for new features
  4. Performance: Opportunities for optimization identified
  5. Future: Roadmap for language evolution

For AI systems working with Gen Z Lang:

  • Follow the architecture patterns established
  • Maintain backward compatibility when extending
  • Document new features thoroughly
  • Write tests for all new functionality
  • Consider performance implications

The language is designed to be simple yet extensible, making it an ideal platform for experimentation and learning about language implementation.