|
| 1 | +snakePU (Snake Processing Unit) Rough Initial Design Doc |
| 2 | +Coming Soon(TM) To A Datacenter Near You |
| 3 | + |
| 4 | + |
| 5 | +The basic idea is to implement a Python bytecode interpreter in hardware ("snakePU") to |
| 6 | + serve as a mega-fast Python coprocessor chip. |
| 7 | + |
| 8 | +More specifically, we create a stack-based RISC microarchitecture ("microsnake") so |
| 9 | + that we can implement the complex behaviors specified by each Python opcode as a series |
| 10 | + of elementary microinstructions across multiple clock cycles. |
| 11 | + |
| 12 | + |
| 13 | +snakePU MAIN REGISTERS: |
| 14 | + -- control |
| 15 | + PC - Program counter (64bit) |
| 16 | + IR - Instruction register |
| 17 | + |
| 18 | + -- data |
| 19 | + AR - Argument register - second byte of opcode (64bit) |
| 20 | + |
| 21 | +INSTRUCTION PHASES: |
| 22 | + 1. Fetch opcode -> IR, AR |
| 23 | + 2. Decode opcode -> load ROM address of microroutine (opcode << 4) into MPC |
| 24 | + 3. Execute microroutine starting at [MPC] |
| 25 | + |
| 26 | + |
| 27 | +== MICROCODE ("microsnake") == |
| 28 | + |
| 29 | +The Python opcode is just an index into a list of microroutines stored in (3 * 16 * 256) bytes of ROM |
| 30 | + (microroutines can be at most 16 instructions long, and there are 256 Python opcodes max (cuz it's BYTEcode)) |
| 31 | + |
| 32 | +These microroutines enable us to implement the complex behavior specified by the Python opcode across multiple clock cycles |
| 33 | + |
| 34 | +Each microinstruction is 24 bits and each bit directly represents a signal that controls various parts of the processor |
| 35 | + (i.e store destinations, ALU ops, etc) |
| 36 | + |
| 37 | +A key benefit of this approach is that we don't need to change the hardware as the opcode values change between |
| 38 | + Python versions. We can just update the layout of the microroutine list in ROM to put each microroutine at the |
| 39 | + correct index, and this can happen automatically when we're flashing the chip. |
| 40 | + |
| 41 | + |
| 42 | +MICROSNAKE REGISTERS: |
| 43 | + -- control |
| 44 | + MPC - Microprogram counter (12bit) (16 * 256 total possible instructions) |
| 45 | + |
| 46 | + -- data |
| 47 | + SP - Stack pointer (64bit) |
| 48 | + TMP[0-3] - temporary registers (64bit) |
| 49 | + |
| 50 | + |
| 51 | +*** microsnake instruction set *** |
| 52 | + |
| 53 | +-- stack operations |
| 54 | +PUSH src |
| 55 | +POKE src |
| 56 | +POP dst |
| 57 | +PEEK dst |
| 58 | + |
| 59 | +-- control operations |
| 60 | +RET - unconditionally exit microroutine (opcode 0x0000) |
| 61 | +SZ - skip next instruction if STACK_TOP is zero (opcode 0x0001) |
| 62 | +SNZ - skip next instruction if STACK_TOP is nonzero (opcode 0x0002) |
| 63 | + |
| 64 | +-- load/store operations |
| 65 | +LGLOBAL (load/store in global table) |
| 66 | +SGLOBAL |
| 67 | +LLOCAL (load/store in local frame) |
| 68 | +SLOCAL |
| 69 | +LCONST (load from constant table) |
| 70 | + |
| 71 | +-- arithmetic operations |
| 72 | +ADD src |
| 73 | +MUL src |
| 74 | +[TODO...] |
| 75 | + |
| 76 | +-- logical operations |
| 77 | +NOT |
| 78 | +AND src |
| 79 | +XOR src |
| 80 | +OR src |
| 81 | +SHL src (shift left - where src is the number of bits to shift) |
| 82 | +SHR src (shift right - same as above) |
| 83 | +[TODO...] |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | +SRC/DST registers |
| 88 | +PC |
| 89 | +AR |
| 90 | +TMP0 |
| 91 | +TMP1 |
| 92 | +TMP2 |
| 93 | +TMP3 |
| 94 | +RAM - treat the value at the top of the stack as a pointer into RAM |
| 95 | + |
| 96 | + |
| 97 | +CONTROL SIGNALS (microinstruction layout, LSB to MSB): |
| 98 | + 1 - SNT |
| 99 | + 1 - SNF |
| 100 | + 1 - W_PC |
| 101 | + 1 - W_AR |
| 102 | + 1 - W_RAM |
| 103 | + 1 - W_TMP0 |
| 104 | + 1 - W_TMP1 |
| 105 | + 1 - W_TMP2 |
| 106 | + 1 - W_TMP3 |
| 107 | + 1 - S_PUSH |
| 108 | + 1 - S_POP |
| 109 | + 2 - D_SEL - select STACK_IN data source (ALU_OUT, GLOBALS[AR], LOCALS[AR], CONST[AR]) |
| 110 | + 5 - ALU_OP (decoded into ALU op inputs) |
| 111 | + 6 - RESERVED |
0 commit comments