Version: 0.1.0 Audience: AI systems extending or modifying Gen Z Lang
- Architecture Overview
- Lexer Implementation
- Parser Implementation
- AST Design
- Interpreter Implementation
- Environment and Scoping
- Error Handling
- Extending the Language
- Performance Considerations
- Future Enhancements
Source Code (.genz file)
↓
[LEXER] - Lexical Analysis
↓
Token Stream
↓
[PARSER] - Syntax Analysis
↓
Abstract Syntax Tree (AST)
↓
[INTERPRETER] - Execution
↓
Output / Side Effects
| Component | File | Responsibility |
|---|---|---|
| Lexer | lexer.py |
Convert source text to tokens |
| Parser | parser.py |
Build AST from tokens |
| Interpreter | interpreter.py |
Execute AST nodes |
| Environment | environment.py |
Manage variable scopes |
| Tokens | tokens.py |
Token type definitions |
| AST Nodes | ast_nodes.py |
AST node class definitions |
| Errors | errors.py |
Exception classes and error helpers |
| Built-ins | builtins.py |
Standard library functions |
| CLI | genz |
Command-line interface |
User writes .genz file
↓
CLI reads file content
↓
Lexer.tokenize() → List[Token]
↓
Parser.parse() → Program (AST root)
↓
Interpreter.interpret() → Execution
↓
print() / input() / side effects
File: genzlang/lexer.py
class Lexer:
def __init__(self, source: str):
self.source = source # Source code
self.pos = 0 # Current position
self.line = 1 # Current line number
self.column = 1 # Current column
self.tokens = [] # Output tokens
self.indent_stack = [0] # Indentation levels
self.at_line_start = True # Line start flagKey Methods:
tokenize()- Main entry point, returns token listcurrent_char()- Get character at current positionpeek(offset)- Look ahead without consumingadvance()- Move to next characterskip_whitespace()- Skip spaces/tabshandle_indentation()- Generate INDENT/DEDENT tokensread_number()- Parse numeric literalsread_string()- Parse string literalsread_identifier_or_keyword()- Parse identifiers/keywords
Algorithm:
- At line start, count leading spaces
- Compare to current indentation level (top of stack)
- If increased: Push new level, emit INDENT token
- If decreased: Pop levels until matching, emit DEDENT tokens
- If inconsistent: Raise LexerError
Example:
vibe? x > 5: # Indent stack: [0]
yeet "yes" # +4 spaces → INDENT, stack: [0, 4]
yeet "done" # Same level, no token
dead: # Back to 0 → DEDENT, stack: [0]
yeet "no" # +2 spaces → INDENT, stack: [0, 2]
Handling:
- Read identifier:
vibe - Check if next identifier forms multi-word:
check - If match found: Emit multi-word token(s)
- Otherwise: Treat as single keyword
Implementation Detail:
Multi-word keywords are handled by peek_word() which looks ahead without consuming characters.
Single-line (ngl):
- Skip from
nglto end of line - No token emitted
Multi-line (tea: ... :tea):
- Skip from
tea:to closing:tea - Supports nesting (future enhancement)
File: genzlang/parser.py
class Parser:
def __init__(self, tokens: List[Token]):
self.tokens = tokens # Input tokens
self.pos = 0 # Current position
self.in_loop = False # Loop context flag
self.in_function = False # Function context flagStrategy:
Each grammar rule maps to a parsing method:
expression → parse_expression()
statement → parse_statement()
primary → parse_primary()
Precedence Hierarchy (low to high):
parse_expression()- Entry pointparse_logical()- AND, ORparse_comparison()- ==, !=, <, >, <=, >=parse_term()- +, -parse_factor()- *, /, %parse_unary()- nah, -parse_power()- ^parse_primary()- Literals, identifiers, parentheses
Method: parse_statement()
Decision tree:
Current token type:
├─ RIZZ → parse_var_declaration()
├─ YEET → parse_yeet_statement()
├─ FR → parse_fr_fr_statement()
├─ VIBE_QUESTION → parse_if_statement()
├─ KEEP → parse_while_loop()
├─ GRIND → parse_for_loop()
├─ MAIN → parse_function_def()
├─ SEND → parse_return_statement()
├─ DIP → parse_break_statement()
├─ SKIP → parse_continue_statement()
├─ IDENTIFIER → parse_assignment() or parse_expression_statement()
└─ else → Error
Method: parse_block()
def parse_block(self) -> List[Statement]:
self.expect(TokenType.INDENT)
statements = []
while not self.match(TokenType.DEDENT, TokenType.EOF):
self.skip_newlines()
stmt = self.parse_statement()
if stmt:
statements.append(stmt)
self.expect(TokenType.DEDENT)
return statementsPurpose: Prevent invalid constructs
Flags:
in_loop: Allowsdip(break) andskip it(continue)in_function: Allowssend it(return)
Example:
def parse_return_statement(self):
if not self.in_function:
raise parse_error('return_outside_function', ...)
# ... parse returnStrategy:
- On parse error, synchronize to statement boundary
- Skip tokens until NEWLINE or keyword that starts statement
- Continue parsing next statement
Not currently implemented - errors halt parsing immediately
File: genzlang/ast_nodes.py
ASTNode (base)
├─ Program
├─ Statement (base)
│ ├─ VarDeclaration
│ ├─ Assignment
│ ├─ YeetStatement
│ ├─ FrFrStatement
│ ├─ IfStatement
│ ├─ WhileLoop
│ ├─ ForLoop
│ ├─ FunctionDef
│ ├─ ReturnStatement
│ ├─ BreakStatement
│ ├─ ContinueStatement
│ └─ ExpressionStatement
└─ Expression (base)
├─ BinaryOp
├─ UnaryOp
├─ Comparison
├─ Logical
├─ FunctionCall
├─ Identifier
├─ Literal
└─ VibeCheck
Dataclass Pattern:
@dataclass
class IfStatement(Statement):
condition: Expression
then_block: List[Statement]
elif_blocks: List[tuple[Expression, List[Statement]]]
else_block: Optional[List[Statement]]Benefits:
- Immutable by default
- Automatic
__init__,__repr__ - Type hints for clarity
All nodes store line and column for error reporting:
@dataclass
class ASTNode:
line: int
column: intFile: genzlang/interpreter.py
class Interpreter:
def __init__(self):
self.global_env = Environment()
self.current_env = self.global_env
# Register built-in functionsTree-Walking Interpreter:
- Recursively visit AST nodes
- Execute statements for side effects
- Evaluate expressions for values
Visitor Pattern (implicit):
def execute_statement(self, stmt: Statement):
if isinstance(stmt, VarDeclaration):
self.execute_var_declaration(stmt)
elif isinstance(stmt, Assignment):
self.execute_assignment(stmt)
# ... etcMethod: evaluate_expression(expr: Expression) -> Any
Returns:
- Numbers (int, float)
- Strings (str)
- Booleans (bool)
- Functions (Function objects)
- None (from functions without return)
Example:
def evaluate_expression(self, expr):
if isinstance(expr, Literal):
return expr.value
elif isinstance(expr, BinaryOp):
return self.evaluate_binary_op(expr)
# ... etcSpecial Exceptions:
class ReturnValue(Exception):
def __init__(self, value):
self.value = value
class BreakException(Exception):
pass
class ContinueException(Exception):
passUsage:
# In function
if return_stmt:
raise ReturnValue(result)
# In loop
try:
execute_loop_body()
except BreakException:
break
except ContinueException:
continueProcess:
- Lookup function in environment
- Check if callable
- Check argument count matches parameters
- Create new environment with closure as parent
- Bind parameters to arguments
- Execute function body
- Catch ReturnValue exception
- Return value (or None)
Function Object:
class Function:
def __init__(self, name, params, body, closure):
self.name = name
self.params = params # List of parameter names
self.body = body # List of statements
self.closure = closure # Enclosing environmentFile: genzlang/environment.py
class Environment:
def __init__(self, parent=None):
self.parent = parent # Enclosing scope
self.variables = {} # Variable storage
def define(self, name, value):
# Create new variable in current scope
def get(self, name):
# Lookup variable up scope chain
def set(self, name, value):
# Update existing variableStructure:
Global Environment
├─ Built-in functions
├─ Top-level variables
└─ Parent: None
Function Environment
├─ Parameters
├─ Local variables
└─ Parent: Closure environment
Nested Function Environment
├─ Parameters
├─ Local variables
└─ Parent: Outer function environment
Algorithm:
- Check current environment's
variablesdict - If not found and parent exists, recursively check parent
- If not found at global level, raise RuntimeError
Example:
rizz global_var = 100
main character outer():
rizz outer_var = 50
main character inner():
rizz inner_var = 25
yeet global_var # Lookup: inner → outer → global ✓
Implementation:
When function is defined, capture current environment:
def execute_function_def(self, stmt: FunctionDef):
func = Function(
name=stmt.name,
params=stmt.params,
body=stmt.body,
closure=self.current_env # Capture closure
)
self.current_env.define(stmt.name, func)When function is called, use closure as parent:
def evaluate_function_call(self, expr: FunctionCall):
func = self.current_env.get(expr.name)
func_env = Environment(parent=func.closure) # Use closure
# ... bind parameters and executeFile: genzlang/errors.py
GenZError (base)
├─ LexerError
├─ ParseError
├─ RuntimeError
└─ TypeError
All errors include:
- Message (with Gen Z slang)
- Line number (optional)
- Column number (optional)
Example:
raise runtime_error(
'undefined_variable',
line=expr.line,
column=expr.column,
name=variable_name
)File: genzlang/errors.py
ERROR_MESSAGES = {
'undefined_variable': "No cap, variable '{name}' doesn't exist fam",
'division_by_zero': "That's not the vibe - can't divide by zero",
# ... etc
}Current behavior:
- Error raised at point of failure
- Propagates up call stack
- Caught by CLI (
genzscript) - Error message printed
- Program exits with code 1
No try-catch mechanism - errors are fatal
Steps:
-
Add token type in
tokens.py:class TokenType(Enum): NEW_KEYWORD = auto()
-
Add to keyword mapping:
KEYWORDS = { 'new_keyword': TokenType.NEW_KEYWORD, }
-
Handle in parser (
parser.py):def parse_statement(self): if self.match(TokenType.NEW_KEYWORD): return self.parse_new_statement()
-
Create AST node (
ast_nodes.py):@dataclass class NewStatement(Statement): # ... fields
-
Implement execution (
interpreter.py):def execute_statement(self, stmt): if isinstance(stmt, NewStatement): self.execute_new_statement(stmt)
File: genzlang/builtins.py
def my_new_function(arg1, arg2):
# Implementation
return result
BUILTINS = {
'my_new_function': my_new_function,
# ... existing functions
}Example: Add round() function:
def round_number(x, digits=0):
"""Round number to given decimal places"""
try:
return round(float(x), int(digits))
except (ValueError, TypeError):
raise ValueError(f"Can't round {x}")
BUILTINS = {
'round_number': round_number,
# ... etc
}Steps:
-
Add token (
tokens.py):FLOOR_DIVIDE = auto() # //
-
Recognize in lexer (
lexer.py):if char == '/' and self.peek(1) == '/': self.tokens.append(Token(TokenType.FLOOR_DIVIDE, '//', ...)) self.advance() self.advance()
-
Parse in expression (
parser.py):def parse_factor(self): left = self.parse_unary() while self.match(TokenType.FLOOR_DIVIDE, ...): op = self.advance() right = self.parse_unary() left = BinaryOp(left, op.value, right, ...)
-
Evaluate (
interpreter.py):def evaluate_binary_op(self, expr): if expr.operator == '//': return int(left // right)
Example: Add List Type
-
Create list literal syntax:
rizz numbers = squad(1, 2, 3, 4, 5) -
Parse as function call:
- Already supported!
squad(...)is a function call
- Already supported!
-
Implement
squad()built-in:def squad(*args): """Create a list""" return list(args) BUILTINS['squad'] = squad
-
Add list methods:
def add_to_squad(lst, item): """Append to list""" if not isinstance(lst, list): raise TypeError("First arg must be a list") lst.append(item) return lst def get_from_squad(lst, index): """Get list item""" return lst[index]
Example: Add Switch/Match Statement
-
Design syntax:
vibe check x: case 1: yeet "one" case 2: yeet "two" default: yeet "other" -
Add tokens:
CASE = auto() DEFAULT = auto()
-
Create AST node:
@dataclass class SwitchStatement(Statement): value: Expression cases: List[tuple[Expression, List[Statement]]] default: Optional[List[Statement]]
-
Parse:
def parse_switch_statement(self): self.expect(TokenType.VIBE) self.expect(TokenType.CHECK) value = self.parse_expression() self.expect(TokenType.COLON) # ... parse cases
-
Interpret:
def execute_switch(self, stmt): value = self.evaluate_expression(stmt.value) for case_value, case_body in stmt.cases: if self.evaluate_expression(case_value) == value: for s in case_body: self.execute_statement(s) return # Execute default
Slow operations:
- Recursive function calls (no tail call optimization)
- String concatenation in loops
- Deep variable lookups in nested scopes
- No caching of parsed AST
1. Bytecode Compilation:
Instead of interpreting AST directly, compile to bytecode:
Source → AST → Bytecode → VM Execution
Benefits:
- Faster execution
- Smaller memory footprint
- JIT compilation possible
2. Constant Folding:
Evaluate constant expressions at parse time:
2 + 3 * 4 → 14 # Computed at parse time3. Variable Resolution Caching:
Cache variable lookup depths:
{
'global_var': 0, # 0 scopes up
'outer_var': 1, # 1 scope up
'local_var': None # Current scope
}4. String Interning:
Reuse string objects:
# Instead of creating new strings
string_pool = {} # Map string → interned objectCurrent:
- Python's garbage collection
- No explicit memory management
- Closures keep references to parent scopes
Future:
- Reference counting for early cleanup
- Weak references for circular dependencies
- Memory limits and quotas
1. Native Data Structures:
ngl Lists
rizz numbers = [1, 2, 3, 4, 5]
rizz first = numbers[0]
numbers[2] = 10
ngl Dictionaries
rizz person = {"name": "Alice", "age": 25}
rizz name = person["name"]
Implementation:
- Add
[and]tokens - Parse list/dict literals
- Add indexing expressions
- Implement indexing operations
2. String Interpolation:
rizz name = "Alice"
yeet "Hello, {name}!" # "Hello, Alice!"
Implementation:
- Parse string literals for
{expr}patterns - Create string concatenation AST
- Evaluate interpolated expressions
3. Exception Handling:
catch it:
rizz result = risky_operation()
oops Error as e:
yeet "Error occurred: " + e
slay:
yeet "Cleanup code"
Implementation:
- Add
catch,oops,slaykeywords - Create TryStatement AST node
- Implement exception catching in interpreter
4. Modules and Imports:
cop "math_utils" # Import module
rizz result = math_utils.calculate(10)
Implementation:
- Add
copkeyword (import) - Implement module loader
- Module search path
- Namespace isolation
5. Classes and OOP:
main character Person:
main character vibe_check(name, age):
it.name = name
it.age = age
main character greet():
yeet "Hello, I'm " + it.name
rizz person = Person("Alice", 25)
person.greet()
Implementation:
- Add
itkeyword (self/this) - Parse class definitions
- Implement object system
- Method binding
6. Decorators:
@enhance
main character my_function():
yeet "Hello"
7. Lambda Expressions:
rizz double = lambda(x) -> x * 2
yeet double(5) # 10
8. Comprehensions:
rizz squares = [x * x for x in range(10)]
9. Pattern Matching:
vibe check value:
case [1, 2, x]:
yeet "Got 1, 2, and " + x
case {"name": n}:
yeet "Name is " + n
10. Async/Await:
async main character fetch_data():
rizz result = await http_get("url")
send it result
1. REPL (Read-Eval-Print Loop):
$ genz
Gen Z Lang v0.1.0
>>> rizz x = 10
>>> yeet x
102. Debugger:
$ genz --debug program.genz
(genz-db) break 10
(genz-db) run
(genz-db) print x3. Profiler:
$ genz --profile program.genz
Function calls: 1000
Total time: 0.5s
Hotspots:
- fibonacci: 0.3s (60%)
- loop: 0.2s (40%)4. Language Server (LSP):
- Syntax highlighting
- Autocomplete
- Jump to definition
- Error checking
- Refactoring
5. Package Manager:
$ genz-pkg install http-client
$ genz-pkg publish my-libraryPlanned modules:
math- Mathematical functionsstring- String operationsarray- Array/list utilitiesfile- File I/Ohttp- HTTP client/serverjson- JSON parsingregex- Regular expressionsdatetime- Date/time operationsrandom- Random number generationcrypto- Cryptographytesting- Unit testing framework
Python code:
- Follow PEP 8
- Type hints for function signatures
- Docstrings for public APIs
- Clear variable names
Gen Z Lang code:
- Snake_case for identifiers
- 4 spaces for indentation
- Comments explain why, not what
- Keep functions focused
Unit tests:
def test_lexer_numbers():
lexer = Lexer("42 3.14")
tokens = lexer.tokenize()
assert tokens[0].type == TokenType.NUMBER
assert tokens[0].value == 42
assert tokens[1].type == TokenType.NUMBER
assert tokens[1].value == 3.14Integration tests:
def test_full_program():
source = """
rizz x = 10
yeet x
"""
lexer = Lexer(source)
tokens = lexer.tokenize()
parser = Parser(tokens)
ast = parser.parse()
interpreter = Interpreter()
interpreter.interpret(ast)
# Check outputFor each feature:
- Language spec update
- API documentation
- Example programs
- Migration guide (if breaking change)
Guidelines:
- Don't break existing code unless absolutely necessary
- Deprecate features before removing
- Provide migration path
- Version the language spec
Constant Propagation:
def optimize_constant_propagation(node):
if isinstance(node, BinaryOp):
left = optimize(node.left)
right = optimize(node.right)
if isinstance(left, Literal) and isinstance(right, Literal):
# Evaluate at compile time
result = evaluate_binary(left.value, node.operator, right.value)
return Literal(value=result, line=node.line, column=node.column)
return nodeDead Code Elimination:
def optimize_dead_code(statements):
result = []
reachable = True
for stmt in statements:
if not reachable:
break # Skip unreachable code
result.append(stmt)
if isinstance(stmt, ReturnStatement):
reachable = False
return resultType Inference:
def infer_type(expr, env):
if isinstance(expr, Literal):
return type(expr.value)
elif isinstance(expr, BinaryOp):
left_type = infer_type(expr.left, env)
right_type = infer_type(expr.right, env)
# Determine result type
# ... etcLinting:
def lint_unused_variables(ast):
declared = set()
used = set()
# Traverse AST
for node in walk(ast):
if isinstance(node, VarDeclaration):
declared.add(node.name)
elif isinstance(node, Identifier):
used.add(node.name)
unused = declared - used
for var in unused:
warn(f"Variable '{var}' declared but never used")Stack Trace:
class CallFrame:
def __init__(self, function_name, line):
self.function_name = function_name
self.line = line
class Interpreter:
def __init__(self):
self.call_stack = []
def execute_function_call(self, expr):
self.call_stack.append(CallFrame(expr.name, expr.line))
try:
# Execute function
pass
finally:
self.call_stack.pop()
def print_stack_trace(self):
for frame in reversed(self.call_stack):
print(f" at {frame.function_name} (line {frame.line})")This internals guide provides complete documentation for understanding and extending Gen Z Lang:
- Architecture: Clear separation of lexer, parser, interpreter
- Implementation: Tree-walking interpreter with environment-based scoping
- Extensibility: Well-defined extension points for new features
- Performance: Opportunities for optimization identified
- Future: Roadmap for language evolution
For AI systems working with Gen Z Lang:
- Follow the architecture patterns established
- Maintain backward compatibility when extending
- Document new features thoroughly
- Write tests for all new functionality
- Consider performance implications
The language is designed to be simple yet extensible, making it an ideal platform for experimentation and learning about language implementation.