This document describes the internal architecture of the ARM2 Emulator.
arm_emulator/
├── main.go # Entry point and CLI
├── vm/ # Virtual machine core
│ ├── cpu.go # CPU state and registers
│ ├── memory.go # Memory management
│ ├── executor.go # Fetch-decode-execute cycle
│ ├── flags.go # CPSR flag operations
│ └── syscall.go # System call handler
├── parser/ # Assembly parser
│ ├── lexer.go # Tokenization
│ ├── parser.go # Syntax analysis
│ ├── symbols.go # Symbol table
│ ├── preprocessor.go # Includes and conditionals
│ └── macros.go # Macro expansion
├── instructions/ # Instruction implementations
│ ├── data_processing.go # MOV, ADD, SUB, etc.
│ ├── memory.go # LDR, STR
│ ├── memory_multi.go # LDM, STM
│ ├── branch.go # B, BL, BX
│ └── multiply.go # MUL, MLA
├── debugger/ # Debugging support
│ ├── debugger.go # Main debugger logic
│ ├── commands.go # Command interpreter
│ ├── breakpoints.go # Breakpoint management
│ ├── watchpoints.go # Watchpoint management
│ ├── expressions.go # Expression evaluator
│ ├── history.go # Command history
│ └── tui.go # Text UI
├── tools/ # Development tools
│ ├── lint.go # Assembly linter
│ ├── format.go # Code formatter
│ └── xref.go # Cross-reference generator
├── tests/ # Test files
└── examples/ # Example programs
The VM module implements the ARM2 processor and memory system.
Core Components:
- 16 general-purpose registers (R0-R15)
- CPSR (Current Program Status Register) with N, Z, C, V flags
- Cycle counter for performance analysis
Key Types:
type VM struct {
R [16]uint32 // Registers
CPSR CPSRFlags // Status flags
PC uint32 // Program counter (alias for R[15])
Mem *Memory // Memory subsystem
Cycles uint64 // Cycle counter
}
type CPSRFlags struct {
N bool // Negative
Z bool // Zero
C bool // Carry
V bool // Overflow
}Architecture:
- 4GB address space (32-bit)
- Segmented memory model:
- Code segment (read-only)
- Data segment (read-write)
- Heap segment (dynamic)
- Stack segment (grows downward)
Features:
- Alignment checking
- Permission enforcement
- Bounds checking
- Little-endian byte order
Key Types:
type Memory struct {
Data map[uint32]byte // Sparse memory
Segments []MemorySegment // Segment definitions
PageSize uint32 // Page size for allocation
}
type MemorySegment struct {
Start uint32
End uint32
Name string
Perms Permissions // Read, Write, Execute
}Fetch-Decode-Execute Cycle:
- Fetch instruction from memory at PC
- Decode opcode and operands
- Execute instruction
- Update PC
- Increment cycle counter
Execution Modes:
- Run: Execute until termination or breakpoint
- Step: Execute single instruction (step into)
- Next: Execute single instruction (step over)
- Finish: Execute until function return
Flag Operations:
- Calculation helpers for all arithmetic/logical operations
- Condition code evaluation (all 16 ARM condition codes)
- Shift operations (LSL, LSR, ASR, ROR, RRX)
Key Functions:
func UpdateNZ(vm *VM, result uint32)
func UpdateCarryAdd(vm *VM, a, b, result uint32)
func UpdateOverflowAdd(vm *VM, a, b, result uint32)
func EvaluateCondition(vm *VM, cond ConditionCode) boolThe parser converts ARM assembly source code into executable form.
Responsibilities:
- Tokenize source code
- Handle comments (
;,//,/* */) - Recognize keywords, registers, labels, directives
- Track line and column positions for error reporting
Token Types:
- Instructions (MOV, ADD, etc.)
- Registers (R0-R15, SP, LR, PC)
- Literals (numbers, strings)
- Operators (
,,[,],#, etc.) - Directives (.org, .word, etc.)
Two-Pass Assembly:
Pass 1: Symbol Collection
- Collect all labels and their addresses
- Expand macros
- Process .equ/.set directives
- Build symbol table
Pass 2: Code Generation
- Resolve label references
- Generate instruction encodings
- Process data directives
- Create relocations
Key Types:
type Instruction struct {
Address uint32
Opcode uint32
Mnemonic string
Operands []string
LineNumber int
}
type Directive struct {
Type DirectiveType
Args []string
Line int
}Features:
- Forward reference resolution
- Duplicate detection
- Scope management (global vs. local labels)
- Constant definitions (.equ)
Symbol Types:
- Code labels
- Data labels
- Constants (.equ)
- External symbols (.extern)
Each instruction category has its own module.
Implemented:
- Move: MOV, MVN
- Arithmetic: ADD, ADC, SUB, SBC, RSB, RSC
- Logical: AND, ORR, EOR, BIC
- Compare: CMP, CMN, TST, TEQ
Common Pattern:
func ExecuteADD(vm *VM, cond, s, rd, rn, op2 uint32) {
if !EvaluateCondition(vm, cond) {
return
}
result := rn + op2
vm.R[rd] = result
if s != 0 {
UpdateFlags(vm, rn, op2, result)
}
vm.Cycles += 1
}Addressing Modes:
- Offset:
[Rn, #offset] - Pre-indexed:
[Rn, #offset]! - Post-indexed:
[Rn], #offset - Register offset:
[Rn, Rm] - Scaled:
[Rn, Rm, LSL #n]
Load/Store Multiple:
- Increment After (IA)
- Increment Before (IB)
- Decrement After (DA)
- Decrement Before (DB)
The debugger provides program analysis and control.
User Input → Command Parser → Debugger Core → VM
↓
Breakpoints
Watchpoints
Expressions
Types:
- Address breakpoints
- Label breakpoints
- Conditional breakpoints
- Temporary breakpoints
Data Structure:
type Breakpoint struct {
ID int
Address uint32
Condition string // Optional
Enabled bool
HitCount int
Temporary bool
}Watch Types:
- Write watchpoint (break on modification)
- Read watchpoint (break on access)
- Access watchpoint (break on read OR write)
Targets:
- Registers
- Memory addresses
- Expressions
Built with:
- github.com/rivo/tview - UI components
- github.com/gdamore/tcell - Terminal handling
Panels:
- Source view (assembly listing)
- Register view (R0-R15, CPSR)
- Memory view (hex dump)
- Stack view (SP region)
- Disassembly view (decoded instructions)
- Command input (debugger commands)
- Output console (results)
- Breakpoints/Watchpoints list
Checks:
- Syntax errors (via parser)
- Undefined labels
- Duplicate labels
- Unused labels (with exceptions)
- Unreachable code
- Register restrictions (MUL, PC usage)
- Best practices
Algorithm:
1. Parse program
2. Build symbol table
3. Check each instruction:
- Validate operands
- Check register usage
- Detect unreachable code
4. Check symbols:
- Find undefined references
- Find unused definitions
5. Generate report
Formatting:
- Consistent indentation (tabs vs. spaces)
- Column alignment for:
- Labels
- Mnemonics
- Operands
- Comments
- Multiple styles (default, compact, expanded)
Algorithm:
1. Parse program
2. Calculate column widths
3. For each line:
- Format label
- Format instruction
- Align operands
- Align comments
4. Output formatted code
Analysis:
- Symbol definitions
- Symbol uses
- Reference types (call, branch, load, store, data)
- Call graph (function relationships)
Output:
Symbol: main
Defined at: line 10
References:
line 5: BL main (call)
line 15: B main (branch)
Symbol: process
Defined at: line 20
References:
line 12: BL process (call)
Source File (.s)
↓
Lexer (tokenize)
↓
Parser (pass 1: collect symbols)
↓
Parser (pass 2: generate code)
↓
VM Memory (load code and data)
↓
VM Registers (initialize)
↓
Ready for execution
VM.Run()
↓
While not terminated:
↓
Fetch instruction at PC
↓
Decode opcode and operands
↓
Check breakpoints/watchpoints
↓
Execute instruction
↓
Update PC
↓
Increment cycles
User Command
↓
Command Parser
↓
Debugger Core
↓
VM Control (step, continue, etc.)
↓
Update TUI/Display results
Used for instruction execution:
type InstructionExecutor interface {
Execute(vm *VM, encoding uint32) error
}Used for watchpoints:
type Watchpoint struct {
Expression string
OnTrigger func()
}Used for debugger commands:
type Command interface {
Execute(debugger *Debugger, args []string) error
}Used for creating instructions from opcodes:
func InstructionFactory(opcode uint32) Instruction {
switch opcode {
case 0b0000: return &DataProcessing{}
case 0b0001: return &MemoryAccess{}
// ...
}
}- Sparse arrays for memory (map-based)
- Only allocate pages as needed
- Segment-based permissions reduce checks
- Direct function calls (not reflection)
- Minimal allocations in execute loop
- Inline condition checks
- Instruction caching: Cache decoded instructions
- JIT compilation: Translate ARM to native code
- Profile-guided optimization: Optimize hot paths
- Memory pooling: Reuse allocations
- Each instruction tested individually
- Flag calculations tested exhaustively
- Memory operations tested for alignment, permissions
- Parser tested with valid and invalid inputs
- Complete programs tested end-to-end
- Example programs verify correctness
- Regression tests prevent bugs from recurring
- Instructions: 95%+
- VM core: 90%+
- Parser: 85%+
- Overall: 85%+
- Define opcode in
instructions/module - Implement execution function
- Add to instruction decoder
- Add tests
- Update documentation
- Define syscall number in
vm/syscall.go - Implement handler function
- Add to syscall dispatcher
- Document in syscall reference
- Add tests
- Add command in
debugger/commands.go - Implement handler function
- Update help text
- Add tests
- tview: TUI components
- tcell: Terminal handling
- cobra: CLI framework
- toml: Configuration files
- testify: Testing assertions
- Go standard library (testing, benchmark)
- JIT Compilation: Translate ARM to native code for speed
- Remote Debugging: GDB protocol support
- Profiling: Performance analysis tools
- Disassembler: Binary to assembly conversion
- ARM3/ARM6: Extended instruction sets
- Plugin System: Load external instruction sets
- Scripting: Lua/JavaScript for automation
- Network: Remote execution and debugging
- Visualization: Call graphs, memory maps