-
Notifications
You must be signed in to change notification settings - Fork 2
QVM
A QVM file is a mod file designed to run in the Quake Virtual Machine.
It is a binary format that is created by the id-provided q3asm assembler, after being compiled into bytecode by the lcc compiler.
A QVM mod can only access memory inside the virtual machine, and can only utilize external functions that are specifically provided by the game engine, and as such, QVM mods are considered "safe" to run.
I have written a small basic disassembler for QVM files called qvmops, which can be found here.
Currently, the only QVM-enabled engines that QMM supports are:
- Quake 3 Arena
- Jedi Knight 2: Jedi Outcast (Multiplayer)
- Star Trek Voyager: Elite Force (Multiplayer)
- Soldier of Fortune II: Double Helix (Multiplayer)
Other games that use QVM that are not supported by QMM:
- I don't know
The format of the QVM file itself is relative simple. It starts with a header that looks like:
| Field | Size (bytes) | Meaning |
|---|---|---|
| magic | 4 | 0x44 0x14 0x72 0x12 |
| instructioncount | 4 | number of instructions in code segment |
| codeoffset | 4 | offset into file where code segment starts |
| codelength | 4 | length of code segment in file |
| dataoffset | 4 | offset into file where data segment starts |
| datalen | 4 | length of the initialized-data part of the data segment |
| litlen | 4 | length of the literal-data part of the data segment (this is for string literals) |
| bsslen | 4 | amount of uninitialized-data space required when the QVM is loaded into memory, including the stack (this is not stored in the file) |
The code segment contains sequential 1-byte opcodes and their associated parameters, with no padding. The following opcodes contain parameters, which are the bytes immediately following the opcode:
| Opcode | Parameter size in bytes |
|---|---|
| OP_ENTER | 4 |
| OP_LEAVE | 4 |
| OP_CONST | 4 |
| OP_LOCAL | 4 |
| OP_EQ | 4 |
| OP_NE | 4 |
| OP_LTI | 4 |
| OP_LEI | 4 |
| OP_GTI | 4 |
| OP_GEI | 4 |
| OP_LTU | 4 |
| OP_LEU | 4 |
| OP_GTU | 4 |
| OP_GEU | 4 |
| OP_EQF | 4 |
| OP_NEF | 4 |
| OP_LTF | 4 |
| OP_LEF | 4 |
| OP_GTF | 4 |
| OP_GEF | 4 |
| OP_BLOCK_COPY | 4 |
| OP_ARG | 1 |
Loading the QVM file into memory is relatively straight-forward:
- Calculate total size of memory needed
- Allocate memory
- Copy code segment into memory
- Copy data segment (including initialized and literal sections) into memory after code segment
The total size will generally be determined from the following calculation:
header->instructioncount * sizeof(qvmop_t) + header->datalen + header->litlen + header->bsslen
Since the OP_CALL, OP_JUMP, etc. instructions operate by moving to a specific instruction index in the code segment (rather than a specific address or byte offset), the simplest way to handle the code segment is to have each instruction take the same space, even if the instruction doesn't actually use a param. So in our case, we treat an instruction in memory as a simple struct of opcode and param that is always the same size:
struct qvmop_t {
int op;
int param;
};
The code segment is then accessed using a qvmop_t*, so pointer arithmetic makes jumping and incrementing simple.
Note: The Quake 3 engine loads instructions into memory exactly as they are in the file (no padding, no space for unused param, etc). It then goes through and makes a separate array to track instruction index vs code segment offset. This allows them to easily jump to a specific instruction while still keeping the instructions at variable length. For ease of code and understanding, I opted for having just a single view into the "enlarged" code segment.
Also of note, the Quake 3 VM assembler adds 0x10000 bytes (64KiB) to the end of the bss segment for a stack, while the Q3 VM interpreter assumes a stack size of 0x20000 bytes (128KiB). This is generally OK, as the Quake 3 engine rounds the data segment size up to the nearest power of 2, so that extra space generally covers the extra 0x10000 bytes for the stack. This is fixed in ioQuake3, however, as both sides use 0x10000 bytes. QMM also uses 0x10000.
Now with that out of the way, you can load the code segment into memory by looping through each byte of the code segment in the file and copying it to each qvmop_t->op, followed by a 4 or 1 byte param in the file (based on the above Parameter Size table) being copied into the associated qvmop_t->param.
The data segment is copied directly from the file's data segment into memory, for a total length of header->datalen + header->litlen (the bss segment is not actually in the file since it is all uninitialized anyway).
Before we begin, we should go over once again how it is laid out in memory:
| Segment | Size | Notes |
|---|---|---|
| Code | header->numops * sizeof(qvmop_t) | Never changes once loaded |
| Data (initialized) | header->datalen | All initialized global/static variables |
| Data (literal) | header->litlen | All string literals from source |
| Data (bss) | header->bsslen | All uninitialized global/static variables, ending with the pre-allocated program stack |
The program stack is used for passing arguments to functions, as well as storing function-local variables. It is grown/shrunk in bytes but accessed generally as ints. It is managed primarily by OP_ENTER and OP_LEAVE, the first and last instructions inside a function. OP_ENTER will create a new stack frame (size=param bytes). The new stack frame from OP_ENTER will be enough to store the function's local variables, the max amount of arguments needed when calling other functions, and 2 additional scratch values (one of which is where return instruction pointers are stored by OP_CALL). OP_LEAVE will remove the stack frame (size=param bytes), and then jump to the return instruction index stored at the top of the caller's stack frame.
There also exists, during VM execution, an opstack. This is a separate smaller stack (not in VM memory) that is used by most instructions for doing normal operations like math, comparison (for branching), storing/loading from memory, etc. This is used for temporaries and operands in a manner like registers would be in a traditional system.
Before execution begins, a new program stack frame is created and the arguments to pass to vmMain are placed inside, along with a sentinel return instruction offset.
Execution begins at the start of the code segment, which should be vmMain's OP_ENTER if compiled correctly. This creates a new stack frame for vmMain. When another function needs to be called, the instruction offset is placed onto the opstack, and then the OP_CALL instruction is used. This stores the current instruction pointer into the current stack frame, then jumps to the address on the top of the opstack.
When the mod needs to call the engine's syscall, a negative instruction offset (specific to each syscall operation) is loaded onto the stack with OP_CONST and then followed by OP_CALL. OP_CALL handles negative instruction pointers specially, and knows to call syscall.
Once vmMain's OP_LEAVE is executed, the sentinel instruction pointer is loaded from the initial stack frame, and the interpreter knows to end execution.
QMM adds some safety checks when executing code, including making sure the code pointer is within the code segment, and making sure any data read/write is within the data segment (including the program stack). It also makes sure either stack doesn't grow outside of its proper space.
Note: The data segment check unfortunately has to be disabled in Soldier of Fortune II, as that engine provides syscalls that return pointers to the engine's memory. Also, SoF2 has another "gametype" QVM that runs alongside the game, and pointers from that are sometimes passed into the game QVM. So in the course of normal operation, pointers referring to memory outside the QVM are accessed.
You can view src/qvm.c and include/qvm.h to see how QMM implements the QVM, and it is fairly heavily-commented. The file is loaded into memory in qvm_load, and the execution is in qvm_exec_ex. If you have any questions, please feel free to reach out via email or open a Discussion.