Note that LBVM is not designed with memory security or safety in mind, hence some of the design choices.
LBVM is little endian.
LBVM has 16 registers of 64 bits, encoded as 0~15.
All registers are callee-saved.
| Name | Encoding | Special function? |
|---|---|---|
| r0 | 0 |
No |
| r1 | 1 |
No |
| r2 | 2 |
No |
| r3 | 3 |
No |
| r4 | 4 |
No |
| r5 | 5 |
No |
| r6 | 6 |
No |
| r7 | 7 |
No |
| r8 | 8 |
No |
| r9 | 9 |
No |
| r10 | 10 |
No |
| r11 | 11 |
No |
| r12 | 12 |
No |
| r13 | 13 |
No |
| status | 14 |
Status register |
| sp | 15 |
Stack pointer |
Small Instruction (4 bytes):
byte0: [opcode:6][oplen:2]
byte1: [reg1:4][reg0:4]
byte2: [reg3:4][reg2:4]
byte3: [flags:8]
Jump/Branch Instruction (4 bytes):
byte0: [opcode:6][-]
byte1~2: [offset:16]
byte3: [flags:8]
Big Instruction (12 bytes):
byte0: [opcode:6][oplen:2]
byte1: [reg1:4][reg0:4]
byte2: [reg3:4][reg2:4]
byte3: [flags:8]
byte4~11: [data:64]
The first byte of instructions consists of a 6 bit opcode and a 2 bit oplen.
Oplen is the length associated with an operation:
| Name | Size | Encoding |
|---|---|---|
| qword | 8 | 0b00 |
| dword | 4 | 0b01 |
| word | 2 | 0b10 |
| byte | 1 | 0b11 |
Oplen is irrelevant for some operations, for those any oplen is allowed.
Floating point arithmetics only allow qword and dword, using them with word or byte halts machine.
LBVM has 7 status flags, stored in the 7 least significant bits of the status register.
| Abbreviation | Meaning | Encoding (Binary) |
|---|---|---|
N |
Negative | 0b00000001 |
Z |
Zero | 0b00000010 |
C |
Carry | 0b00000100 |
V |
Overflow | 0b00001000 |
E |
Equal | 0b00010000 |
G |
Greater | 0b00100000 |
L |
Less-than | 0b01000000 |
Despite of this, the status register is 64 bits for convenience sake.
Any instruction that sets a status flag clears all other irrelevant bits to zero.
Instructions that uses the status flags (e.g. b, csel) uses its 8 flags bits as the status flags.
The 7 least significant bits can be masked together for boolean OR logic, while the most significant bit is used as negation.
For example, the byte 0b10110000 (0b10000000 & G & E) encodes the condition of !(greater | equal).
The status register can be directly addressed to using its encoding of 15 in similar fashion to other registers.
But upon using status-affecting instructions to change the status register (e.g. load_imm a value into status register),
the status register would be immediately overwrote within the same instruction.
LBVM has a virtual memory vmem of 192kB, with 3 segments of 64kB.
The first segment (vmem address 0x00000 ~ 0x0FFFF) is used as the stack.
The second segment (vmem address 0x10000 ~ 0x1FFFF) is used as the text segment.
The third segment (vmem address 0x20000 ~ 0x2FFFF) is used as the data segment.
Their is no way for the machine to execute code outside of the text segment.
Memory of the host machine can also be accessed, depending on the vmem flag.
The load/store instructions takes in a vmem flag as the least significan bit in their flags byte.
The vmem flag determines whether the virtual or the host memory is accessed (0 for vmem, 1 for real memory).
Pointers have sizes of 64 bits.
Access of vmem with out-of-bound pointers results in halting of machine.
Stack overflow/underflow is checked.
PC overflow/underflow is checked.
LBVM has three addressing modes for load/store instructions.
imm: Immediatedir: Directind: Indirect (loads value on address ofreg + offset, whereoffsetis a 64-bit immediate)
Note that for store instructions, the addressing mode refers to the address of the destination address, unlike load, for which the addressing mode determines the location of the source value.
For this reason store_dir and load_dir are small instructions while the other load/store instructions are big instructions.
| Name | Opcode | Oplen relevant? | Status affected | Encoding Fomat | Encoding (without first byte) |
|---|---|---|---|---|---|
brk |
0 | No | - | Small | [-][-][-][-][-] |
cbrk |
1 | No | - | Small | [-][-][-][-][cond] |
nop |
2 | No | - | Small | [-][-][-][-][-] |
load_imm |
3 | Yes | NZ | Big | [dest][-][-][vmem][data] |
load_dir |
4 | Yes | NZ | Small | [dest][addr][-][-][vmem] |
load_ind |
5 | Yes | NZ | Big | [dest][addr][-][vmem][offset] |
store_imm |
6 | Yes | NZ | Big | [-][src][-][-][vmem][addr] |
store_dir |
7 | Yes | NZ | Small | [addr][src][-][-][vmem] |
store_ind |
8 | Yes | NZ | Big | [addr][src][-][-][vmem][offset] |
mov |
9 | Yes | NZ | Small | [dest][src][-][-][-] |
cmp |
10 | Yes | NZCVEGL | Small | [lhs][rhs][-][-][-] |
fcmp |
11 | Yes | NZCVEGL | Small | [lhs][rhs][-][-][-] |
csel |
12 | Yes | - | Small | [dest][lhs][rhs][-][cond] |
b |
13 | No | - | Jump/Branch | [offset][cond] |
j |
14 | No | - | Jump/Branch | [offset][-] |
add |
15 | Yes | NZCV | Small | [dest][lhs][rhs][-][-] |
sub |
16 | Yes | NZCV | Small | [dest][lhs][rhs][-][-] |
mul |
17 | Yes | NZV | Small | [dest][lhs][rhs][-][-] |
div |
18 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
mod |
19 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
iadd |
20 | Yes | NZCV | Small | [dest][lhs][rhs][-][-] |
isub |
21 | Yes | NZCV | Small | [dest][lhs][rhs][-][-] |
imul |
22 | Yes | NZV | Small | [dest][lhs][rhs][-][-] |
idiv |
23 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
imod |
24 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
fadd |
25 | Only qword/dword | NZ | Small | [dest][lhs][rhs][-][-] |
fsub |
26 | Only qword/dword | NZ | Small | [dest][lhs][rhs][-][-] |
fmul |
27 | Only qword/dword | NZ | Small | [dest][lhs][rhs][-][-] |
fdiv |
28 | Only qword/dword | NZ | Small | [dest][lhs][rhs][-][-] |
fmod |
29 | Only qword/dword | NZ | Small | [dest][lhs][rhs][-][-] |
ineg |
30 | Yes | NZ | Small | [dest][lhs][-][-][-] |
fneg |
31 | Only qword/dword | NZ | Small | [dest][lhs][-][-][-] |
shl |
32 | No | NZ | Small | [dest][lhs][rhs][-][-] |
shr |
33 | No | NZ | Small | [dest][lhs][rhs][-][-] |
and |
34 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
or |
35 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
xor |
36 | Yes | NZ | Small | [dest][lhs][rhs][-][-] |
not |
37 | Yes | NZ | Small | [dest][lhs][-][-][-] |
muladd |
38 | Yes | NZV | Small | [dest][lhs][rhs][rhs2][-] |
call |
39 | No | - | Jump/Branch | [offset][-] |
ccall |
40 | No | - | Jump/Branch | [offset][cond] |
ret |
41 | No | - | Small | [-][-][-][-][-] |
push |
42 | Yes | - | Small | [src][-][-][-][-] |
pop |
43 | Yes | NZ | Small | [dest][-][-][-][-] |
libc_call |
44 | No | - | Small | [-][-][-][-][libc_callcode] |
native_call |
45 | No | - | Big | TODO |
vtoreal |
46 | No | - | Small | [dest][src][-][-][-] |
breakpoint |
63 | No | - | Small | [-][-][-][-][-] |
Note that because all registers are callee-saved, value of status register might change after call, ccall, libc_call, native_call, even though the instruction itself does not touch the status register.
LBVM uses a 8-bit callcode for calling libc functions. It does not cover all the libc functions, but the more common ones.
| Name | Callcode |
|---|---|
exit |
255 |
malloc |
1 |
realloc |
2 |
free |
3 |
fwrite |
4 |
fread |
5 |
printf |
6 |
fprintf |
7 |
scanf |
8 |
fscanf |
9 |
puts |
10 |
fputs |
11 |
snprintf |
12 |
fopen |
13 |
fclose |
14 |
memcpy |
15 |
memmove |
16 |
memset |
17 |
bzero |
18 |
strlen |
19 |
strcpy |
20 |
strcat |
21 |
strcmp |
22 |
Implementation of LBVM may be able to load a bytecode program from a file under this program file format, which is essentially a snapshot of the machine's memory.
The file bytestream must start with the byte sequence:
4C 42 56 4D 50 72 6F 67 72 61 6D
Which are LBVMProgram encoded in ASCII.
The bytestream must then be in the form of consequtive blocks, the format of a block is as such:
+-------------------+====================+=============+==========+
| Magic number 0xAA | Start address: u32 | Length: u16 | Data ... |
+-------------------+====================+=============+==========+
Note that Start address and Length are in little endian.
Note that a block is not allowed to span through different memory segments.