This repository comprises a number of RISC-V experiments, including a pair of minimalist RV32EC CPU cores.
Alfa was my first attempt at designing and implementing a RISC-V core. It is relatively straightforward:
- Support for all RV32E base instructions except ECALL/EBREAK/FENCE
- Optional RV32EC compressed instruction support
- Unpipelined, single-cycle execution for all instructions
- Conventional 15x32 bit register file for RV32E registers x1-x15
- No support for signaling exceptions or interrupts, and no support for trap handling
- No support for unaligned memory accesses
- Separate instruction and data address spaces and CPU ports
Alfa was intended to allow implementing moderately complex sequential logic using C code, instead of by hand-coding the corresponding state machine using your HDL of choice, as the former is simpler and quicker and less error-prone, and is, at least for me, being much more of a software person than a hardware/HDL person, generally preferable to the latter.
Each instruction executes in a single cycle. To be able to execute LOAD instructions in a single cycle as well, a bit of forwarding logic is used to forward the load data coming from memory to any register source operands that use that load data in the subsequent cycle before it has been written back to the register file.
Alfa can be paired up with a single 32-bit wide instruction memory block when used in RV32E mode, but needs a pair of 16-bit wide memory blocks in RV32EC mode, because of the RV32EC requirement to support 16-bit aligned 32-bit instruction fetches and because of Alfa's single-cycle design. For data, a single 32-bit wide memory block can be used, as Alfa does not support unaligned accesses.
RV32(E)C compressed instructions are (optionally) handled by expanding them to RV32[EI] instructions and then running them through the RV32I decoder.
In the alfa/ directory you can find Alfa's core logic, the optional RV32C-to-RV32I instruction expander, and some wrapper logic to allow simulating the execution of some of the example C programs in prog/ with Icarus Verilog.
One of the intended targets for Alfa was the TinyFPGA BX FPGA board, which sports an ICE40LP8K FPGA. ICE40 FPGAs feature 4096 bit Embedded Block RAM blocks that have a maximum configurable width of 16 bits, which leads to an organisation of 256 words x 16 bits, and which combined with Alfa's need for separate 32-bit wide instruction and data memories means that a minimal implementation of Alfa on an ICE40 FPGA requires four such RAM blocks, and that is a significant fraction of the amount of RAM blocks available on lower-end devices.
Another Alfa feature that I was ultimately somewhat unhappy with in light of Alfa's intended use case is the effective need for 15 * 32 = 480 bits of distributed RAM for the register file, plus an array of 15-bit wide multiplexers for determining each of the 64 bits of rs1 and rs2 for each instruction.
What started as a thought experiment to attempt to address these two issues resulted in Bravo.
Bravo was my second attempt at a RISC-V core.
Compared to Alfa, Bravo uses a unified instruction/data address space, and this address space is accessed using 16-bit memory accesses. This means that 32-bit memory accesses now take two cycles to execute, but this allows the use of a single 16-bit wide RAM block for all instructions and data, where Alfa needs at least four such blocks.
Also, Bravo does away with the traditional register file, and instead of storing x1-x15 in dedicated registers, Bravo stores these in RAM, along with all of the executing program's text and data segments. This design decision means that the number of cycles Bravo needs to execute an instruction now depends on the number of (non-x0) source and destination register operands an instruction has, as illustrated by the following examples:
c.nop(0x0001) takes 1 clock cycle to execute: 1 cycle to fetch the instruction, and then no additional cycles, since no registers are read or written by this instruction.c.mv a5, a1(0x87ae) takes 5 clock cycles to execute: 1 cycle to fetch the instruction, 2 cycles to reada1from RAM, and 2 cycles to writea5to RAM.addi a1, a5, 1(0x00178593) takes 6 clock cycles to execute: 2 cycles to fetch the instruction, 2 cycles to reada5from RAM, and 2 cycles to writea1to RAM. Adding1toa5happens in the first store cycle fora1.beq a1, a2, +6(0x00c58363) takes 6 clock cycles to execute: 2 cycles to fetch the instruction, 2 cycles to reada1from RAM, and 2 cycles to reada2from RAM. Comparinga1witha2and updatingpcaccordingly happens in the last fetch cycle.
Given that execution speed was never a priority for this use case, I was willing to at least consider trading it for reduced resource usage, and I followed through on taking Bravo from the thought experiment stage to an actual implementation because I wanted to see how this trade-off would work out in practice, and whether the reduction in resource usage would outweigh the increase in complexity of the instruction execution logic in Bravo compared to that in Alfa.
Bravo goes through the following steps when executing an instruction:
------------------------------------------------------------------------------------------------
FetchInsnLow x
FetchInsnHigh o
------------------------------------------------------------------------------------------------
LOAD OPIMM AUIPC STORE OP LUI BR JALR JAL
------------------------------------------------------------------------------------------------
FetchRs1Low o o - o o - o o -
FetchRs1High + + - + + - + + -
------------------------------------------------------------------------------------------------
FetchRs2Low - - - o o - o - -
FetchRs2High - - - + + - + - -
LoadLow x - - - - - - - -
LoadHigh o - - - - - - - -
------------------------------------------------------------------------------------------------
StoreLow - - - x - - - - -
StoreHigh - - - o - - - - -
WriteRdFromAluLow - o o - o o - o o
WriteRdFromAluHigh - + + - + + - + +
WriteRdFromLoadLow o - - - - - - - -
WriteRdFromLoadHigh + - - - - - - - -
------------------------------------------------------------------------------------------------
x: required step
o: optional step
+: required follow-up step for preceding optional step
-: step does not apply to this instruction class
The FetchRs1* steps are skipped for an instruction if that instruction's rs1 register is x0. Similarly, FetchRs2* are skipped if rs2 is x0, and WriteRdFrom* are skipped if rd is x0.
Compared to Alfa, Bravo has a formal split between instruction decoding and instruction execution. This means that Bravo uses an internal decoded instruction format, and Bravo's RV32I and RV32C instruction decoders both decode directly to that internal instruction format, as opposed to Alfa where the RV32C decoder outputs RV32I instructions. This means that, at least in theory, it is possible to disable Bravo's RV32I instruction decoder, such that it would only support executing RV32C instructions. (This would probably not be very useful in practice.)
Bravo and its relevant simulation wrapper logic can be found in the bravo/ directory.
Both Alfa and Bravo support the decoding of instructions that use registers x16-x31, as there is no additional complexity involved in supporting this in the decoder, but both Alfa and Bravo should only be asked to execute instructions that use registers x0-x15, in order not to invoke undefined behavior. That is, their instruction decoding logic is not RV32E(C)-specific, but their execution logic is. It would be easy to modify either core to support the full 32 registers, by expanding the register file for Alfa or by tweaking the register base RAM address for Bravo, but I don't really see a use case for this, since the programs you would run in the sort of constrained execution environment that Alfa and Bravo represent typically really don't need more than 15 registers to run well.
Note that neither Alfa nor Bravo signal an exception if you try to use registers in the x16-x31 range, and illegal instructions are not handled gracefully by either of them. As an implementation artifact, Alfa will jump to address zero if it detects an invalid instruction, effectively restarting execution of the configured program, while Bravo will invoke undefined behavior. Both Alfa and Bravo have a signal coming from their respective instruction decoders that signals when an invalid instruction is seen, and it would be possible to use that signal to handle invalid-instruction conditions more gracefully if the execution environment were to require that.
The Rust code in this repository consists of prototyping and scaffolding code for Bravo -- the Bravo instruction decoder and execution logic were first modeled in Rust, and then implemented in Verilog.
There are tests for all the immediate encodings supported by the Rust RV32I and RV32C instruction decoders, and there is a test proving that the Rust and Verilog instruction decoders are formally equivalent, by exhaustively decoding all possible instruction words and verifying that the decoders produce the same output. However, while all simple test programs that I have tried to run on Alfa and Bravo seem to work as expected, there is currently no test or proof that either of these decoders or their corresponding execution logic conform to the RISC-V Unprivileged Architecture specification. This should be excused by the fact that I ultimately really only wrote these CPU cores to improve my understanding of and augment my experience with digital sequential logic design, but it also means that you should not use either of these cores as-is for anything important!
The currently used assembly start-up code expects main() to never return, but if it does return, the start-up code will try to re-start the program by calling main() again. When this happens, the BSS section will be cleared again, but we cannot reinitialize the data sections because we don't keep copies of the original contents of the data sections around, which means that initialized global and static variables will not be reinitialized to their original values when the program is re-started in this way, and that means that you probably don't ever want main() to return if you use any initialized global or static variables.