-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Technical Design Document
Overview
We aim to build a RISC-V→(AMD64 or ARM64) dynamic recompiler that executes untrusted code in a separate process for security. Our design targets:
- Single-Threaded Guest: The guest program does not support multithreading.
- Only these RISC-V registers are used:
- RA, SP, T0, T1, T2, S0, S1, A0, A1, A2, A3, A4, A5.
- 32-Bit Addressing: The guest sees a 32-bit address space.
- Separate Processes: The “Controller” (host) is written in Swift (with some C for low-level codegen). The “Guest” runs in a child process.
- Shared Memory: For transferring code/data from the Controller to the Guest.
- Pipes: For control signals (run commands, exit, etc.) between the two processes.
This setup ensures that if the guest code crashes or is compromised, it cannot directly harm the Controller or the rest of the system, thanks to OS-level process boundaries.
High-Level Architecture
-
Controller (Swift)
- Loads the RISC-V program (adhering to the 13 specified registers).
- Decodes instructions (we only care about the subset used by RA, SP, T0, T1, T2, S0, S1, A0–A5).
- Generates AMD64 or ARM64 machine code.
- Writes the compiled code into a shared memory region.
- Coordinates execution with the child Guest process via pipes.
-
Guest Process
- Maps the shared memory region for code + data.
- Receives run commands or instructions via pipe from the Controller.
- Executes the compiled machine code in its own memory space.
- Sends status or completion messages back to the Controller.
Since the guest is single-threaded, we do not handle concurrency or locking inside the guest’s instruction translation.
Memory and Communication
Shared Memory Layout
-
Code Segment:
- Contains native machine code generated for the subset of RV32 instructions that reference only RA, SP, T0–T2, S0–S1, A0–A5.
- Marked read + exec (no write) once generation is complete (for security).
-
Data Segment:
- Holds guest data sections (stack, global data, etc.).
- Marked read + write (no exec).
- 32-bit address space (the guest’s addresses range up to
0xFFFFFFFF).
Pipe Communications
- Controller → Guest:
- Commands: “RUN at PC=X,” “EXIT,” etc.
- Guest → Controller:
- Responses: “DONE,” or e.g. “SYSCALL needed,” or “EXCEPTION.”
No concurrency means we don’t need advanced scheduling. Each process has a single thread; the Controller can block waiting for the Guest’s response.
Recompiler Design
- Input: A RISC-V 32-bit binary (using only 13 registers).
- Translation: Directly decode instructions to short native code sequences (AMD64/ARM64).
- Block-Based Approach (Optional for PoC):
- For better performance, compile instructions in “blocks” until control flow changes or a limit is reached.
- Cache the compiled block in shared memory, link branches if possible.
Register Mapping
Since only 13 registers are used, we can map them to host registers easily. For example, on x86_64:
- RBX, RBP, R12–R15 might store some guest registers.
- The rest can be spilled or placed in memory if needed.
We track:
- RA (return address),
- SP (stack pointer),
- T0–T2 (temporary),
- S0–S1 (saved),
- A0–A5 (argument registers).
In typical RISC-V calling convention, RA is x1, SP is x2, etc. We only store and restore these as they are used. No concurrency or multi-thread context switching is required.
32-Bit Addressing
- All addresses are assumed to fit in 32 bits.
- The translator ensures load/store instructions do not exceed the allocated data region in shared memory.
Execution Model
- Controller:
- Decodes an instruction referencing e.g.
A0, A1, T0…. - Emits a short snippet that loads host registers, performs the operation, and writes back to the shared memory if needed.
- Decodes an instruction referencing e.g.
- Guest:
- Receives a command to RUN from a certain offset.
- Jumps into that code block.
- On completion, returns to a small stub that notifies the parent process.
Security and Sandboxing
- Separate Processes: The guest cannot directly access the Controller’s memory.
- Memory Protections:
- Shared memory for code is set to R+X.
- Data region is R+W only.
- No multithreading concerns.
- Bounds Checking: We ensure loads/stores from the guest do not exceed the 32-bit address range allocated in shared memory.
- Syscalls: If the guest does an ecall or system call, the child can either handle them in-process or signal the Controller (depending on design needs).
Implementation Plan & Task Breakdown
Below is a step-by-step guide, organized into Phases: PoC, MVP, Beta, and Production. It shows how to gradually implement and refine the recompiler under the new constraints.
Phase 1: Proof of Concept (PoC)
Goals
- Demonstrate a minimal “translate and run” flow for a single instruction or small snippet referencing only a couple of the 13 registers.
- Confirm separate process + shared memory + pipe approach works on macOS and Linux.
Tasks
-
Set Up Swift + C Project
- Swift code for the Controller logic.
- C code for machine code emission.
- Bridging header for Swift↔C interop.
-
Shared Memory + Pipes
- Use
shm_open+mmap(or an equivalent) for cross-platform. - Create two pipes: one for
Controller→Childand one forChild→Controller. fork()the child. The child alsommaps the shared memory region.
- Use
-
Minimal Decoder
- Decode a single RISC-V instruction referencing, say,
A0andA1. - Hardcode an example:
A0 = A0 + A1.
- Decode a single RISC-V instruction referencing, say,
-
Codegen
- In C, emit the short host machine code that loads from
A0,A1(in shared memory), adds them, and stores toA0. - Write the code into the shared memory code segment.
- In C, emit the short host machine code that loads from
-
Running in Child
- Child receives a “RUN” command from the pipe, casts the code pointer to a function, calls it.
- Child sends “DONE” back.
-
Verification
- Print out the updated value of
A0in shared memory from the Controller. - Confirm it matches the expected sum.
- Print out the updated value of
Deliverables
- A working PoC that can do a single RISC-V-like operation in the separate child process.
- Verified on both macOS and Linux.
Phase 2: Minimum Viable Product (MVP)
Goals
- Add support for the full set of 13 registers (RA, SP, T0–T2, S0–S1, A0–A5).
- Translate multiple RISC-V instructions.
- Handle a small user-space program that uses those registers and the 32-bit memory space.
- Confirm basic security measures (code memory R+X, data memory R+W).
Tasks
-
Decoder & Translator
- Implement decoding for common RV32 instructions referencing the 13 registers (loads/stores, ALU ops, branches).
- Maintain a small “dispatcher” approach for multiple instructions.
- Store register states in shared memory (or map them to host registers if feasible).
-
Memory Layout (32-bit)
- Provide a contiguous region for the guest’s stack, data, and code references up to 4GB in the child.
- Basic bounds checking (if
address >= dataBase && address < dataEnd).
-
Basic Control Flow
- If an instruction is an unconditional branch, compile a short jump to the next block or revert to a dispatcher loop.
- For a conditional branch, handle in a minimal way (jump or fallthrough).
- End blocks at branches or a maximum instruction count.
-
Pipes for Coordination
- The child requests new block addresses from the Controller if it encounters an uncompiled region.
- The Controller compiles the block, writes to code segment, signals the child to re-run.
-
No Threading
- Simplify: no concurrency or locks. The child runs a single thread.
- The parent only blocks/waits for pipe messages.
Deliverables
- An MVP recompiler that can run a small RISC-V user program referencing the 13 registers in single-thread mode.
- The child process can do typical operations (arithmetic, branches, loads/stores).
- Verified on sample programs.
Phase 3: Beta
Goals
- Improve performance with block linking (avoid too many pipe round-trips).
- Add more robust error handling (e.g., out-of-bounds access).
- Provide partial or minimal syscall support if needed (e.g., ecall).
Tasks
-
Block Linking
- If a branch target is known and already compiled, emit a direct jump to that code offset.
- Reduce returning to the Controller for each branch.
-
Exception Handling
- Catch segmentation faults or invalid memory in the child, convert them to RISC-V-like exceptions or simply exit.
- Provide debug output or logs for deeper introspection.
-
Syscalls / ecall (Optional)
- If the guest code does an
ecall, either handle a minimal set of syscalls in the child or forward them to the parent via pipe. - Limit the child’s OS calls (e.g., seccomp on Linux, macOS sandbox) to mitigate risk.
- If the guest code does an
-
Testing & Profiling
- Test multiple scenarios, measure block translation overhead vs. runtime.
- Optimize code emission if necessary (fewer instructions, register usage, etc.).
Deliverables
- A Beta that runs more complex single-threaded programs referencing the 13 registers.
- Faster block execution due to direct linking.
- Additional security and stability checks.
Phase 4: Production
Goals
- Finalize the recompiler for reliability, performance, and maintainability.
- Thoroughly document and test all features.
- Ensure it handles real-world usage within the 32-bit address constraint and the 13 register subset.
Tasks
-
Full Test & Fuzzing
- Stress test with random instructions that only use these 13 registers.
- Confirm no memory leaks, handle all corner cases.
-
Performance Tuning
- Evaluate whether direct vs. cached register usage is sufficient.
- Possibly refine code generation to reduce overhead.
-
Security Audit
- Verify child cannot break out of the 4GB memory region.
- Ensure no leftover debugging or reflection capabilities can be exploited.
-
Documentation
- Detailed explanations of each component.
- Clear build instructions for macOS and Linux.
-
Release
- Tag a final stable version.
- Provide a usage guide for others to compile and run RISC-V code using the recompiler.
Deliverables
- A production-ready system with robust translation of single-threaded RISC-V code (only 13 registers), strong sandboxing, and 32-bit memory model.
- Comprehensive documentation and test suite.