-
-
Notifications
You must be signed in to change notification settings - Fork 0
CPU opcode dispatch model
This is the largest architectural change between Peanut-GB and Walnut-CGB. The CPU dispatch model has been altered to prefetch the next opcode/operand, reducing memory map decoding and memory bandwidth if using an MCU that can make use of 16-bit bit flash or psram memory(must be aligned on esp32 but it supports both 16 and 32bit flash memory reads).
The following diagram is a simplified high-level overview of how the dual-fetch chained execution model works.
The best scenario above is when two 8-bit operations can be immediately chained together with no load in between, or if the first operation consumes the prefetched byte as one of its operands. These two cases make up the majority of all opcodes providing a boost to execution speed, especially with MCU's that can read 16-bits from memory(even if an alignment check must be used with an 8-bit fallback, such as when using the internal esp32 flash memory).
In worst case scenarios with a given MCU if 8-bit paths must be used for the 16-bit(or 32-bit) read functions they still result in an overall increased execution speed thanks to one less navigation of the memory map.