|
| 1 | +# π§© Multi-Cycle RISC-V Simulator β Stage Design Overview |
| 2 | + |
| 3 | +This document explains the functional design of each stage in the **multi-cycle RISC-V simulator** implemented in `mc_core.c`. |
| 4 | + The design divides instruction execution into **five sequential stages**, each representing a major phase of instruction processing. |
| 5 | + The overall control flow is managed by a simple **finite-state machine (FSM)** that transitions between these stages according to the instruction type. |
| 6 | + |
| 7 | +------ |
| 8 | + |
| 9 | +## βοΈ Overview of the Multi-Cycle FSM |
| 10 | + |
| 11 | +Each instruction goes through the following pipeline-like stages: |
| 12 | + |
| 13 | +| Stage | Name | Description | |
| 14 | +| ------------ | ------------------ | ------------------------------------------------------------ | |
| 15 | +| `STAGE_IF` | Instruction Fetch | Fetch instruction from memory using the current PC | |
| 16 | +| `STAGE_ID` | Instruction Decode | Decode instruction, identify type, extract operands and immediates | |
| 17 | +| `STAGE_EX` | Execute | Perform ALU computation or branch/jump target calculation | |
| 18 | +| `STAGE_MEM` | Memory Access | Perform load or store operations | |
| 19 | +| `STAGE_WB` | Write-Back | Write results to register file | |
| 20 | +| `STAGE_DONE` | Completion | Instruction execution finished | |
| 21 | + |
| 22 | +The FSM transitions between these stages using the helper function `push_stage()`: |
| 23 | + |
| 24 | +- It examines the **instruction type** (`TYPE_R`, `TYPE_I`, `TYPE_S`, `TYPE_B`, `TYPE_U`, `TYPE_J`) and decides which stage should follow. |
| 25 | +- For example, R-type instructions skip `MEM`, while load/store instructions include it. |
| 26 | + |
| 27 | +------ |
| 28 | + |
| 29 | +## π§ Stage Details |
| 30 | + |
| 31 | +### π¦ 1. Instruction Fetch (IF) |
| 32 | + |
| 33 | +**Function:** `void mc_IF(Decode *s)` |
| 34 | + |
| 35 | +**Main Tasks:** |
| 36 | + |
| 37 | +- Read the instruction from instruction memory using the current **program counter (PC)**. |
| 38 | +- Compute the sequential next PC (`snpc = PC + 4`). |
| 39 | +- Set both `snpc` and `dnpc` (default next PC) to this value. |
| 40 | + |
| 41 | +**Pseudo-code:** |
| 42 | + |
| 43 | +```c |
| 44 | +s->pc = cpu.pc; |
| 45 | +s->inst = inst_fetch(s->pc); |
| 46 | +s->snpc = s->pc + 4; |
| 47 | +s->dnpc = s->snpc; |
| 48 | +``` |
| 49 | + |
| 50 | +**Key Notes:** |
| 51 | + |
| 52 | +- This stage consumes one cycle for memory access. |
| 53 | +- No register or memory modification occurs. |
| 54 | +- The output is a decoded `inst` that will be passed to the next stage. |
| 55 | + |
| 56 | +------ |
| 57 | + |
| 58 | +### π© 2. Instruction Decode (ID) |
| 59 | + |
| 60 | +**Function:** `void mc_ID(Decode *s)` |
| 61 | + |
| 62 | +**Main Tasks:** |
| 63 | + |
| 64 | +- Decode the fetched instruction into **operation type**, **register indices**, and **immediate values**. |
| 65 | +- Identify instruction category (`TYPE_R`, `TYPE_I`, `TYPE_S`, etc.). |
| 66 | +- For load/store instructions, set the flag `s->is_load = 1` or `0`. |
| 67 | +- For `jal` and `jalr`, compute preliminary `dnpc` (jump target). |
| 68 | + |
| 69 | +**Key Features:** |
| 70 | + |
| 71 | +- Uses the `INSTPAT` macro pattern system to match binary opcodes. |
| 72 | +- Initializes control information that determines later behavior. |
| 73 | +- Increments `global_cycle_count` to represent the decode phase delay. |
| 74 | + |
| 75 | +**Example:** |
| 76 | + |
| 77 | +```c |
| 78 | +INSTPAT("??????? ????? ????? 000 ????? 11001 11", jalr, I, s->is_load = 0); |
| 79 | +INSTPAT("??????? ????? ????? 010 ????? 00000 11", lw, I, s->is_load = 1); |
| 80 | +``` |
| 81 | +
|
| 82 | +**Outputs:** |
| 83 | +
|
| 84 | +- Instruction classification (`s->type`) |
| 85 | +- Immediate value and register IDs |
| 86 | +- Branch or jump control signals (`dnpc`) |
| 87 | +
|
| 88 | +------ |
| 89 | +
|
| 90 | +### π₯ 3. Execute (EX) |
| 91 | +
|
| 92 | +**Function:** `void mc_EX(Decode *s, uint64_t *alu_result)` |
| 93 | +
|
| 94 | +**Main Tasks:** |
| 95 | +
|
| 96 | +- Perform arithmetic, logic, shift, and comparison operations via ALU. |
| 97 | +- Compute branch or jump target addresses. |
| 98 | +- Update `dnpc` if a branch is taken. |
| 99 | +- For load/store instructions, compute the **effective address**. |
| 100 | +
|
| 101 | +**Example Behaviors:** |
| 102 | +
|
| 103 | +- `add` / `sub`: perform integer arithmetic. |
| 104 | +- `sll`, `sra`, `and`, `or`, `xor`: perform bitwise operations. |
| 105 | +- `beq`, `bne`, `blt`, etc.: compare operands and update `s->dnpc`. |
| 106 | +- `jalr`: set jump target to `(src1 + imm) & ~1`. |
| 107 | +
|
| 108 | +**Performance modeling:** |
| 109 | +
|
| 110 | +- Arithmetic and logical ops add `+1` to `global_cycle_count`. |
| 111 | +- Multiplication/division instructions add more (up to +39 cycles) to reflect realistic multi-cycle latency. |
| 112 | +
|
| 113 | +**Output:** |
| 114 | +
|
| 115 | +- `*alu_result` contains the computed result or memory address. |
| 116 | +
|
| 117 | +------ |
| 118 | +
|
| 119 | +### π¨ 4. Memory Access (MEM) |
| 120 | +
|
| 121 | +**Function:** `void mc_MEM(Decode *s, uint64_t alu_result, uint64_t *mem_result)` |
| 122 | +
|
| 123 | +**Main Tasks:** |
| 124 | +
|
| 125 | +- For **load** instructions: read from memory at address `alu_result`. |
| 126 | +- For **store** instructions: write to memory at address `alu_result` using `src2`. |
| 127 | +- Pass the loaded data to the next stage (`mem_result`). |
| 128 | +
|
| 129 | +**Implementation Highlights:** |
| 130 | +
|
| 131 | +```c |
| 132 | +// Load |
| 133 | +INSTPAT("??????? ????? ????? 011 ????? 00000 11", ld, I, *mem_result = Mr(alu_result, 8)); |
| 134 | +// Store |
| 135 | +INSTPAT("??????? ????? ????? 010 ????? 01000 11", sw, S, Mw(alu_result, 4, src2)); |
| 136 | +``` |
| 137 | + |
| 138 | +**Cycle accounting:** |
| 139 | + |
| 140 | +- Each load/store operation increments the global cycle counter by 1. |
| 141 | + |
| 142 | +**Output:** |
| 143 | + |
| 144 | +- `mem_result` holds the loaded value (for loads). |
| 145 | +- Stores do not produce results to write back. |
| 146 | + |
| 147 | +------ |
| 148 | + |
| 149 | +### π§ 5. Write-Back (WB) |
| 150 | + |
| 151 | +**Function:** `void mc_WB(Decode *s, uint64_t alu_result, uint64_t mem_result)` |
| 152 | + |
| 153 | +**Main Tasks:** |
| 154 | + |
| 155 | +- Write results back into the register file (`R(rd)`). |
| 156 | +- Choose between `alu_result` (for arithmetic ops) or `mem_result` (for loads). |
| 157 | +- Handle jump instructions (`jal`, `jalr`) by writing return address (`pc + 4`) to the destination register. |
| 158 | + |
| 159 | +**Examples:** |
| 160 | + |
| 161 | +```c |
| 162 | +INSTPAT("??????? ????? ????? ??? ????? 01101 11", lui, U, R(rd) = imm); |
| 163 | +INSTPAT("??????? ????? ????? 000 ????? 00000 11", lw, I, R(rd) = mem_result); |
| 164 | +INSTPAT("0000000 ????? ????? 000 ????? 01100 11", add, R, R(rd) = alu_result); |
| 165 | +``` |
| 166 | +
|
| 167 | +**Key Points:** |
| 168 | +
|
| 169 | +- Enforces `R(0) = 0` (as required by RISC-V). |
| 170 | +- Marks the completion of the instruction (`STAGE_DONE`). |
| 171 | +
|
| 172 | +------ |
| 173 | +
|
| 174 | +## π Stage Transition Summary |
| 175 | +
|
| 176 | +The **`push_stage()`** function governs how each instruction advances through the stages: |
| 177 | +
|
| 178 | +| Current Stage | Next Stage | Condition | |
| 179 | +| ------------- | ---------------------- | ------------------------------------------------------ | |
| 180 | +| `IF` | `ID` | Always | |
| 181 | +| `ID` | `EX` or `WB` | Depends on instruction type (`J` jumps directly to WB) | |
| 182 | +| `EX` | `MEM`, `WB`, or `DONE` | Loads/stores go to `MEM`; others to `WB` | |
| 183 | +| `MEM` | `WB` or `DONE` | Loads β `WB`; stores β `DONE` | |
| 184 | +| `WB` | `DONE` | Always | |
| 185 | +
|
| 186 | +This ensures that: |
| 187 | +
|
| 188 | +- R-type instructions: **IF β ID β EX β WB β DONE** |
| 189 | +- I-type arithmetic: **IF β ID β EX β WB β DONE** |
| 190 | +- Loads: **IF β ID β EX β MEM β WB β DONE** |
| 191 | +- Stores: **IF β ID β EX β MEM β DONE** |
| 192 | +- Branches: **IF β ID β EX β DONE** |
| 193 | +- Jumps: **IF β ID β WB β DONE** |
| 194 | +
|
| 195 | +------ |
| 196 | +
|
| 197 | +## π§© Cycle Count Modeling |
| 198 | +
|
| 199 | +Each stage contributes to the global cycle count: |
| 200 | +
|
| 201 | +| Stage | Typical Cycle Cost | Notes | |
| 202 | +| ----- | ------------------ | -------------------------------------- | |
| 203 | +| IF | +1 | Fetch from memory | |
| 204 | +| ID | +1 | Decode delay | |
| 205 | +| EX | +1 | ALU op; multiply/divide adds up to +39 | |
| 206 | +| MEM | +1 | Memory access | |
| 207 | +| WB | +1 | Register write delay | |
| 208 | +
|
| 209 | +Thus, a typical **R-type** instruction takes ~5 cycles, while a **load** instruction takes ~6 cycles, and a **DIV** may take ~40 cycles. |
0 commit comments