-
Notifications
You must be signed in to change notification settings - Fork 108
SVML Specification
This page serves as a repository for preliminary specification of a virtual machine code (byte code) format for Virtual Machine implementations of Source.
A frame consists of a
The SVM recognises these distinct types:
- undefined, a singleton
- null, also a singleton
- boolean, either
trueorfalse - number
- string
- array
- function
Implementations are recommended to implement number semantics following the IEEE 754 double-precision floating point specification where possible. As an allowance for platforms for which this would be too expensive, implementations may implement number semantics following the single-precision floating point specification instead.
Strings are arbitrary sequences of bytes (including the zero byte). SVM defines no operations on strings other than concatenation, so character encoding does not affect SVM string semantics.
Arrays are maps from non-negative integer numbers (indexes) to any value. Loading an unassigned index results in undefined.
Arrays have a length property, accessed by the primitive function array_length, that returns one plus the highest index that has been assigned to, or 0 if no index has been assigned to.
Note: assigning undefined to an array index is indistinguishable except for the effect of the assignment on the array's length.
TODO
(click on the link)
There are two standard representations of a SVML program. VM implementations are free to accept the representation that works best for them.
TODO
The assembly code consists of an array of arrays. Each element array represents one instruction. Each instruction has the opcode in position 0, followed by the arguments, which might include numbers, boolean values or strings, depending on the instruction.
TODO
Header:
- Magic word: 0x5005ACAD
- Version number: 2 bytes for minor, 2 bytes for major
- Constant pool count
- Constant pool
- Code
The code is a sequence of bytes, with segments of length 1 to 3 representing individual instructions. The first byte is the opcode, and the following bytes are the arguments.
- Instructions are byte-aligned.
- All instruction opcodes are one byte long.
- All operands are in target device endianness.
- We use the integer and float type names from Rust to denote operand types in instruction entries.
- E.g.
u8refers to an 8-bit unsigned integer;i32refers to a 32-bit signed integer;f32refers to a 32-bit (single-precision) floating point.
- E.g.
- An
addressis a 32-bit unsigned integeru32that refers to an offset from the start of the program. - An
offsetis a 32-bit signed integeru8that refers to an offset from the start of the next instruction.
Instructions should be concatenated with no padding between instructions, as well as their operands. Operands should be encoded in target device endianness. For example, the following instructions
ldc.i 123
pop
should result in the following (hex) bytes, when targetting a little-endian device:
01 7B 00 00 00 08
See here for comparison: https://en.wikipedia.org/wiki/Java_class_file
Each constant pool entry has:
- 1 byte: Type of constant pool entry
- 2 bytes: Length of constant pool entry in bytes (including 1 + 2)
- remaining bytes: data of constant pool entry (for example the string, in unicode)