A progressive, from-scratch implementation of a RISC-V RV32I processor in Verilog, developed across multiple evaluation levels. The project begins with a single-cycle baseline and advances through a 5-stage pipelined design, ultimately culminating in a 2-wide dual-issue superscalar core. All designs target the Xilinx Spartan-7 50T FPGA (xc7s50csga324-1) using Vivado 2025.2.
- Repository Layout
- Architecture Overview
- ISA Coverage Summary
- Level-by-Level Description
- Pipeline Architecture
- Hazard Handling
- Module Reference
- Simulation
- Synthesis and Implementation
- Hardware Deployment — Board I/O
- Synthesis Results
- Eval2 Architecture Comparison
- Known Issues
riscv/
├── src/ # Root: standalone single-cycle RV32I core
├── sim/ # Root: single-cycle simulation hex + testbench
├── mem/ # Root: output from gen_hex.py
├── gen_hex.py # Level-0 test program generator (19 instructions)
├── comparison.txt # Eval2 architecture comparison table
├── eval2_report.txt # Eval2 detailed synthesis + simulation report
│
├── eval2/ # Historical: single-cycle baseline (eval2)
├── eval2_multicycle/ # Historical: FSM multicycle (eval2)
├── eval2_pipeline/ # Historical: 4-stage pipeline IF/ID/EX/WB (eval2)
├── eval2_superscalar/ # Historical: early 2-wide superscalar attempt (eval2)
├── eval2_complete/ # Historical: Vivado project version (eval2)
├── eval1_presentation/ # LaTeX/PDF presentation slides
│
├── level2/pipeline/ # 5-stage pipeline: R/I/load/store/branch
│ ├── src/ # 19 Verilog source files
│ ├── sim/ # Testbench, program.hex, run.tcl
│ ├── synth.tcl # Synthesis script (riscv_core standalone)
│ ├── impl.tcl # Implementation script (riscv_core standalone)
│ ├── board_impl_boolean.tcl # Full boolean_top implementation + program
│ ├── boolean.xdc # Boolean Board constraints
│ ├── arty_s7.xdc # Arty S7-50 constraints
│ ├── impl_out/ # Synthesis reports: riscv_core standalone
│ └── board_out_boolean/ # Synthesis reports: boolean_top full build
│
├── level3/pipeline/ # Level2 + JAL/JALR/LUI/AUIPC
│ ├── src/ # Verilog source files
│ ├── sim/ # Testbenches, program.hex, program_new.hex
│ ├── assemble.py # Python RV32I assembler
│ ├── boolean.xdc
│ ├── synth.tcl
│ └── board_out_boolean/ # Synthesis reports
│
├── level4/pipeline/ # Level3 + comprehensive 69-instruction test
│ ├── src/ # Verilog source files (identical RTL to level3)
│ ├── sim/ # Testbench, 69-instruction program.hex
│ ├── assemble.py # Enhanced assembler (all RV32I instructions)
│ ├── boolean.xdc
│ ├── synth.tcl
│ └── board_out_boolean/ # Synthesis reports
│
└── l4_superscalar/ # 2-wide dual-issue superscalar (simulation only)
├── src/ # Monolithic riscv_core.v + supporting modules
└── sim/ # Testbench, program.hex
The project follows a linear progression:
Level 0 (root src/) Single-cycle — all instructions in one clock
|
Eval2 variants Single-cycle / multicycle FSM / 4-stage pipeline /
| early superscalar (historical, for comparison)
|
Level 2 (5-stage) IF → ID → EX → MEM → WB
| R/I-ALU, LW, SW, all 6 branch types
|
Level 3 (5-stage) Level2 + LUI, AUIPC, JAL, JALR
|
Level 4 (5-stage) Level3 RTL, 69-instruction comprehensive test suite
|
L4 Superscalar 2-wide dual-issue, 4-stage, full RV32I + byte/half memory
All pipelined designs (level2–level4) share the same 5-stage pipeline skeleton with a common hazard/forwarding architecture. The superscalar is a separate, monolithic design.
| Instruction Group | Level 2 | Level 3 | Level 4 | L4 Superscalar |
|---|---|---|---|---|
| R-type (ADD/SUB/AND/OR/XOR/SLL/SRL/SRA/SLT/SLTU) | Yes | Yes | Yes | Yes |
| I-type ALU (ADDI/ANDI/ORI/XORI/SLTI/SLTIU/SLLI/SRLI/SRAI) | Yes | Yes | Yes | Yes |
| LW | Yes | Yes | Yes | Yes |
| SW | Yes | Yes | Yes | Yes |
| LB / LH (byte/halfword loads) | No | No | No | Yes |
| SB / SH (byte/halfword stores) | No | No | No | Yes |
| BEQ / BNE | Yes | Yes | Yes | Yes |
| BLT / BGE / BLTU / BGEU | Yes | Yes | Yes | Yes |
| LUI | No | Yes | Yes | Yes |
| AUIPC | No | Yes | Yes | Yes |
| JAL | No | Yes | Yes | Yes |
| JALR | No | Yes | Yes | Yes |
| FENCE / ECALL / EBREAK | No | No | No | Decoded (NOP) |
A standalone single-cycle RV32I processor in src/riscv_core.v. All seven source files are self-contained (no stage hierarchy):
| File | Description |
|---|---|
riscv_core.v |
Top-level single-cycle core; PC, decode, execute, memory, writeback all in one module |
alu.v |
32-bit ALU |
control_unit.v |
Combinational control decoder |
imm_gen.v |
Immediate generator (all types) |
regfile.v |
32×32 register file |
instr_mem.v |
Synchronous instruction memory ($readmemh) |
data_mem.v |
Data memory |
The test program in sim/program.hex is generated by gen_hex.py (19 instructions: ADDI, ADD, SUB, AND, OR, XOR, SLL, SRLI, SLT, SLTU, ANDI, ORI, XORI, SLLI, SRLI, SRAI, JAL x0 halt).
Four designs implemented for comparative evaluation (eval2/, eval2_multicycle/, eval2_pipeline/, eval2_superscalar/). All target the Artix-7 xc7a35tcpg236-1 and use a 12-instruction ALU test program. See Section 12 for a full comparison table.
Path: level2/pipeline/
The first fully pipelined design. Implements a classic 5-stage RISC pipeline with full forwarding and hazard detection.
Supported ISA:
- All 10 R-type: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU
- All 9 I-type ALU: ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI
- LW (word load)
- SW (word store)
- All 6 branches: BEQ, BNE, BLT, BGE, BLTU, BGEU
Immediate types supported: I-type, S-type, B-type (no U or J).
Boards: Boolean Board (boolean.xdc) and Arty S7-50 (arty_s7.xdc). Both use the same device xc7s50csga324-1 (Spartan-7 50T).
Test program (sim/program.hex): 29 instructions. Exercises all supported instruction types, including a loop that sums 1–5 (expected result: 15 in x10).
Source files:
| File | Description |
|---|---|
riscv_core.v |
Top-level: instantiates all pipeline stages |
if_stage.v |
Instruction fetch; PC register and IMem read |
if_id_reg.v |
IF/ID pipeline register |
id_stage.v |
Instruction decode; regfile read |
id_ex_reg.v |
ID/EX pipeline register |
ex_stage.v |
Execute; ALU, forwarding muxes, branch logic |
ex_mem_reg.v |
EX/MEM pipeline register |
mem_stage.v |
Memory access; data memory read/write |
mem_wb_reg.v |
MEM/WB pipeline register |
wb_stage.v |
Writeback; selects ALU result or load data |
hazard_fwd_unit.v |
Load-use stall detection + forwarding control |
alu.v |
32-bit ALU (11 operations) |
decoder.v |
Combinational instruction decoder |
imm_gen.v |
Immediate sign-extension (I/S/B types) |
regfile.v |
32×32 register file, write-through |
instr_mem.v |
Synchronous instruction memory |
data_mem.v |
Synchronous word-addressed data memory |
boolean_top.v |
Boolean Board top-level wrapper |
arty_s7_top.v |
Arty S7-50 top-level wrapper |
seg7_ctrl.v |
Time-multiplexed 4-digit hex display driver |
Path: level3/pipeline/
Extends level2 with the remaining non-memory, non-ALU base instructions: LUI, AUIPC, JAL, JALR. This completes the control-flow subset of RV32I.
New in level3 vs level2:
| Feature | Detail |
|---|---|
imm_gen.v |
Adds U-type (lui/auipc) and J-type (jal) immediates |
decoder.v |
Adds output signals: jump, jalr, lui, auipc, link |
id_ex_reg.v |
Carries 5 additional control bits through the pipeline |
ex_stage.v |
alu_a mux: selects PC for AUIPC; alu_result mux: selects PC+4 for link (JAL/JALR); computes jalr_target = (rs1+imm) & ~1, jal_target = PC+imm |
assemble.py: Full Python RV32I assembler. Supports all R/I/S/B/U/J instruction formats, forward and backward label references, and all 6 branch types. Assembles source text directly to sim/program.hex.
Test programs:
sim/program.hex: 17-instruction nested loop / function-call test via JALsim/program_new.hex: 69-instruction comprehensive test covering all level3 instructions (assembled byassemble.py)
Testbenches:
sim/tb_riscv_core.v: Stack operations, nested loops, JAL call/returnsim/tb_new_program.v: Elaborate per-instruction pass/fail checking for all level3 ISA
Board: Boolean Board only.
Path: level4/pipeline/
The RTL is identical to level3 — same source files, same module hierarchy, same synthesis results. Level4 is distinguished by its enhanced assemble.py and test suite, which provide exhaustive coverage of all implemented instructions.
Test program (sim/program.hex): 69 instructions assembled by assemble.py. Covers:
- LUI: loads
0x12345000into x3 - AUIPC: loads
PC + 0xABCDE000into x4 (expected:0xABCDE008) - All 10 R-type operations
- All 9 I-type ALU operations
- SW + LW round-trip (pattern:
0xDEAD) - All 6 branch types (counter incremented once per taken branch, final expected: 6)
- JAL + JALR with stack push/pop via SP (x2)
Testbench (sim/tb_riscv_core.v): Checks each result with explicit $display/$fatal assertions. Final pass message confirms all checks passed.
Board: Boolean Board only.
Path: l4_superscalar/
A 2-wide dual-issue superscalar processor. This is a simulation-only design (no board top, no XDC, no synthesis TCL). Architecture differs significantly from the single-issue levels.
Key design decisions:
| Feature | Detail |
|---|---|
| Fetch | Fetches 2 instructions per cycle (instr0, instr1) from instr_mem |
Issue gate (dual_ok) |
Issues pair only when: both slots valid; neither is a control instruction (branch/jal/jalr); no intra-group RAW hazard (slot0.rd == slot1.rs1 or slot1.rs2); not both memory ops; no FENCE |
| PC advance | PC + 8 when dual_ok, PC + 4 when single-issue, hold on stall, redirect on branch taken |
| Pipeline stages | 4 stages: IF/ID → EX → MEM → WB (pairs tracked as e0/e1, m0/m1, w0/w1) |
| Dual ALUs | u_alu0 + u_alu1 operating in parallel |
| Forwarding | 2-bit fa0/fb0/fa1/fb1: 00=none, 01=MEM slot0, 10=MEM slot1, 11=WB either |
| Register file | Dual-write port (we0/we1), 4 read ports (rs1_0/rs2_0/rs1_1/rs2_1), write-through on all read ports |
| Data memory | Byte-addressable with width[1:0] and mem_unsigned for LB/LH/LW/SB/SH/SW |
control_unit.v |
Extended decoder: adds mem_to_reg, fence, ecall, ebreak, mem_width[2:0], mem_unsigned |
Note: instr_mem.v contains a hardcoded Windows-style default path for MEM_FILE. This must be overridden at instantiation on Linux (pass MEM_FILE as a parameter or edit the default).
Test program (sim/program.hex): 17-instruction simple loop and function-call sequence. Testbench prints a full per-cycle pipeline trace (both slots, EX/MEM/WB stages) and checks x1–x19 against expected values.
All level2/3/4 designs share this 5-stage pipeline:
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
clk ──>│ IF │───>│ ID │───>│ EX │───>│ MEM │───>│ WB │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
│ if_id │ id_ex │ ex_mem │ mem_wb │
│ ══════> │ ══════> │ ══════> │ ══════> │
│ │ │ │ │
PC + IMem Decode + ALU + DMem Regfile
RegRead Branch Write
Resolve
Pipeline registers: if_id_reg, id_ex_reg, ex_mem_reg, mem_wb_reg — all synchronous, reset-to-NOP on flush.
Key microarchitectural properties:
| Property | Value / Description |
|---|---|
| Clock | 100 MHz (10 ns period), single sys_clk domain |
| Reset | Active-low synchronous (rst_n) |
| Instruction memory | Synchronous read, word-addressed, $readmemh initialized |
| Data memory | Synchronous write, combinatorial read, word-addressed via addr[31:2] |
| Register file | Write-through: WB write visible to ID-stage reads in same cycle |
| Branch resolution | EX stage — 2-cycle penalty; IF and ID stages flushed on redirect |
| Forwarding | EX→EX (from EX/MEM register) and MEM→EX (from MEM/WB register) |
| Debug port | dbg_addr[4:0] / dbg_data[31:0] on regfile for board inspection |
ALU operations (4-bit control):
| Code | Operation | Code | Operation |
|---|---|---|---|
| 0000 | ADD | 0101 | SRL |
| 0001 | SUB | 0110 | SLT |
| 0010 | AND | 0111 | SLTU |
| 0011 | OR | 1000 | SRA |
| 0100 | XOR | 1001 | SLL |
| — | — | 1010 | PASS_B (for LUI) |
Implemented in hazard_fwd_unit.v (instantiated inside ex_stage.v).
Forward paths resolve RAW hazards without stalling when the producing instruction has already computed its result:
fwd_a / fwd_b |
Meaning |
|---|---|
2'b00 |
No forwarding — use register file output |
2'b01 |
Forward from EX/MEM register (one cycle ago) |
2'b10 |
Forward from MEM/WB register (two cycles ago) |
Forwarding conditions (for fwd_a — fwd_b is symmetric):
// EX/MEM forward (higher priority)
if (ex_mem_reg_write && ex_mem_rd != 0 && ex_mem_rd == id_ex_rs1)
fwd_a = 2'b01;
// MEM/WB forward
else if (mem_wb_reg_write && mem_wb_rd != 0 && mem_wb_rd == id_ex_rs1)
fwd_a = 2'b10;
A 1-cycle bubble is inserted when a load is immediately followed by a dependent instruction. Detected in riscv_core.v:
stall = id_ex_mem_read && (id_ex_rd != 0) &&
(id_ex_rd == id_rs1 || id_ex_rd == id_rs2);On stall: PC is held, IF/ID register is held, ID/EX register is flushed to NOP.
Branches are resolved in the EX stage. On a taken branch:
pc_redirectsignal asserted,pc_targetdriven to branch/jump target- IF and ID pipeline registers flushed (2-cycle penalty)
- Not-taken branches incur no penalty (no branch prediction — assume not-taken)
The following table describes every module used in the level2/3/4 pipeline (with notes on level3/4 additions):
| Module | Instantiated as | Description |
|---|---|---|
riscv_core |
u_core |
Pipeline top-level; wires all stages; contains stall logic |
if_stage |
u_if |
PC register, PC+4/redirect mux, instruction memory read |
if_id_reg |
u_if_id |
IF/ID pipeline register; flush-to-NOP on redirect or stall |
id_stage |
u_id |
Decoder, immediate generator, register file read |
id_ex_reg |
u_id_ex |
ID/EX pipeline register; carries control + data; has synchronous reset signal flush_decode |
ex_stage |
u_ex |
ALU + forwarding muxes + branch condition evaluation; level3+ adds jump/lui/auipc muxes |
ex_mem_reg |
u_ex_mem |
EX/MEM pipeline register |
mem_stage |
u_mem |
Data memory instantiation; LW/SW |
mem_wb_reg |
u_mem_wb |
MEM/WB pipeline register |
wb_stage |
u_wb |
Writeback mux: ALU result or load data |
hazard_fwd_unit |
u_haz |
Forwarding logic (inside ex_stage); stall signal to core |
alu |
u_alu |
32-bit combinational ALU; 4-bit control |
decoder |
u_dec |
Combinational control signal generation from opcode/funct3/7 |
imm_gen |
u_imm |
Sign-extended immediate; level3+ adds U and J types |
regfile |
u_rf |
32×32 register file; x0 hardwired to zero; write-through |
instr_mem |
u_imem |
Synchronous ROM; parameterized MEM_FILE |
data_mem |
u_dmem |
Synchronous RAM; word-addressed |
boolean_top |
(top) | Boolean Board wrapper: clock buffer, reset, seg7_ctrl, switch/LED I/O |
arty_s7_top |
(top) | Arty S7-50 wrapper (level2 only) |
seg7_ctrl |
u_seg7 |
4-digit 7-segment display driver; time-multiplexed |
- Icarus Verilog (
iverilog/vvp) - (Optional) GTKWave for VCD waveform viewing
cd level2/pipeline/sim
# Compile
iverilog -o sim.out -s tb_riscv_core \
tb_riscv_core.v \
../src/riscv_core.v \
../src/if_stage.v \
../src/if_id_reg.v \
../src/id_stage.v \
../src/id_ex_reg.v \
../src/ex_stage.v \
../src/ex_mem_reg.v \
../src/mem_stage.v \
../src/mem_wb_reg.v \
../src/wb_stage.v \
../src/hazard_fwd_unit.v \
../src/alu.v \
../src/decoder.v \
../src/imm_gen.v \
../src/regfile.v \
../src/instr_mem.v \
../src/data_mem.v
# Run (program.hex must be in the working directory or MEM_FILE path correct)
vvp sim.out
# View waveforms
gtkwave tb_riscv_core.vcdOr use the provided TCL script from Vivado:
source sim/run.tclSame flow. Level3 has two testbenches:
# Basic JAL/stack test
iverilog -o sim.out -s tb_riscv_core tb_riscv_core.v [sources...]
# Comprehensive all-instruction test
iverilog -o sim.out -s tb_new_program tb_new_program.v [sources...]The assemble.py in level3 and level4 assembles RV32I assembly source to hex:
cd level3/pipeline # or level4/pipeline
python3 assemble.py # reads inline assembly, writes sim/program.hexTo modify the test program, edit the assembly string inside assemble.py and re-run.
cd l4_superscalar/sim
iverilog -o sim.out -s tb_riscv_core \
tb_riscv_core.v \
../src/riscv_core.v \
../src/control_unit.v \
../src/alu.v \
../src/regfile.v \
../src/instr_mem.v \
../src/data_mem.v
vvp sim.outNote: Edit instr_mem.v to fix the MEM_FILE default path (currently a Windows absolute path) or pass MEM_FILE as a parameter.
- Vivado 2025.2 (lin64), Build 6299465
- Device:
xc7s50csga324-1(Spartan-7 50T) - Boolean Board clock: 100 MHz, pin F14, LVCMOS33
- Arty S7 clock: 100 MHz, pin P14
# In Vivado TCL console or batch mode:
source level2/pipeline/synth.tcl # Synthesize riscv_core only
source level2/pipeline/impl.tcl # Implement riscv_core (no board I/O)Reports land in level2/pipeline/impl_out/.
source level2/pipeline/board_impl_boolean.tcl
# This script also calls program_board.tcl to flash the bitstreamReports land in level2/pipeline/board_out_boolean/.
source level3/pipeline/synth.tcl # or level4/pipeline/synth.tclReports land in levelN/pipeline/board_out_boolean/.
All three levels (2, 3, 4) use an identical boolean_top.v wrapper.
| Signal | Direction | Pins / Width | Description |
|---|---|---|---|
clk |
Input | F14 | 100 MHz system clock |
btn[0] |
Input | 1 bit | Active-high reset (synchronous) |
sw[4:0] |
Input | 5 bits | Register select — choose which x0–x31 register to display |
led[15:0] |
Output | 16 bits | Lower 16 bits of the selected register |
seg[6:0] |
Output | 7 bits | 7-segment cathode signals (active-low) |
an[3:0] |
Output | 4 bits | 7-segment anode enables (active-low, time-multiplexed) |
dp |
Output | 1 bit | Decimal point (driven low / unused) |
Usage:
- Program the board with the generated bitstream
- Press
btn[0]to reset the CPU; it will begin executing from address 0 - Set
sw[4:0]to the register number you want to inspect (e.g.,01010= x10) led[15:0]shows the lower 16 bits of that register immediately- The 4-digit hex display shows the full 32-bit value of the selected register
Segment display encoding: seg7_ctrl cycles through digits 0–3 at a frequency derived from the 100 MHz clock (typically ~1 kHz digit refresh). Each digit displays one hex nibble (0–F).
Level2 only. Same functional mapping as boolean_top but with Arty S7 pin assignments.
| Signal | Direction | Pin | Description |
|---|---|---|---|
clk |
Input | P14 | 100 MHz system clock |
btn[0] |
Input | — | Reset |
sw[3:0] |
Input | — | Register select (lower 4 bits) |
led[3:0] |
Output | — | Lower 4 bits of selected register |
| Resource | Total Available |
|---|---|
| Slice LUTs | 32,600 |
| Slice Registers | 65,200 |
| Block RAM | 75 (36Kb each) |
| DSP48E1 | 120 |
| IOBs | 210 |
Note: Most logic was optimized away (constant inputs); these numbers reflect the pruned implementation, not the full core.
| Resource | Used | Available | Utilization |
|---|---|---|---|
| Slice LUTs | 2 | 32,600 | < 0.01% |
| Slice Registers | 30 | 65,200 | 0.05% |
| Block RAM | 0 | 75 | 0% |
Timing (post-route, standalone): WNS = +7.161 ns — timing met. Critical path: PC adder carry chain, 9 logic levels, 2.839 ns.
Power (post-route, standalone): Total = 0.101 W, Dynamic = 0.030 W.
| Metric | Level 2 | Level 3 | Level 4 |
|---|---|---|---|
| Slice LUTs (total) | 1,463 | 1,666 | 1,666 |
| — LUT as Logic | 885 | 1,088 | 1,088 |
| — LUT as Dist. RAM | 578 | 578 | 578 |
| CARRY4 | 40 | 66 | 66 |
| Slice Registers (FFs) | 344 | 437 | 437 |
| F7/F8 Muxes | 384 | 385 | 385 |
| Slices | 465 | 529 | 529 |
| Block RAM | 0 | 0 | 0 |
| IOBs | 53 (25.2%) | 53 (25.2%) | 53 (25.2%) |
| BUFGCTRL | 1 | 1 | 1 |
LUT utilization: L2 = 4.49%, L3/L4 = 5.11% of device.
Level2 → Level3/4 increase: +203 LUTs (+14%), +93 FFs (+27%). This overhead comes from the 5 additional control signals (jump, jalr, lui, auipc, link) propagating through the ID/EX pipeline register, the EX-stage mux tree, and the extended imm_gen.
Level3 vs Level4: Identical RTL, identical synthesis results. The difference is only in the test program and testbench.
| Level | WNS (ns) | TNS (ns) | Failing Endpoints | Status |
|---|---|---|---|---|
| Level 2 | -2.306 | -335.187 | 162 / 6,592 | NOT MET |
| Level 3 | -3.407 | -743.564 | 256 / 6,802 | NOT MET |
| Level 4 | -3.407 | -743.564 | 256 / 6,802 | NOT MET |
All hold-time constraints are met (WHS > 0 in all cases).
Level 2 critical path: id_ex.out_rs2[0] → forwarding unit → ALU (SRL carry chain, bits [4–8]) → zero flag → pc_redirect flop reset. 12 logic levels, 11.811 ns data path delay, 9.805 ns routing. Slack = -2.306 ns → actual Fmax ≈ 79.5 MHz.
Level 3/4 critical path: ex_mem.out_rd[2] → forwarding unit (fwd_a) → ALU input mux → ALU bit [26] carry chain → zero flag → pc_redirect → id_ex pipeline register reset. 14 logic levels, 12.901 ns data path delay, 10.709 ns routing. Slack = -3.407 ns → actual Fmax ≈ 73.5 MHz.
Why timing fails: The critical path passes through: (1) the MEM/WB or EX/MEM forwarding comparators, (2) ALU operand selection muxes, (3) the 32-bit ALU carry chain (especially for shift/compare operations that the synthesizer cannot map to CARRY4 efficiently), (4) the zero/branch-taken combinational logic, and (5) the conditional flush/redirect path to the pipeline register reset pin — all in a single cycle. The routing component (83% of path delay) indicates the design is spread across the fabric, which inflates net delays significantly. Reducing the clock to ~70 MHz or retiming the branch-taken → flush path would resolve violations.
| Metric | Level 2 | Level 3 | Level 4 |
|---|---|---|---|
| Total On-Chip Power (W) | 0.078 | 0.132 | 0.132 |
| Dynamic Power (W) | 0.006 | 0.061 | 0.061 |
| Device Static (W) | 0.072 | 0.072 | 0.072 |
| Junction Temperature (C) | 25.4 | 25.7 | 25.7 |
| Confidence Level | Low | Low | Low |
The low confidence level is expected (no simulation activity file provided). The dominant dynamic power consumer in level3/4 is I/O (0.041 W at Vcco33), followed by signals (0.010 W) and slice logic (0.005 W).
Historical designs implemented for direct architectural comparison. All run at 100 MHz reference; device: Artix-7 xc7a35tcpg236-1; 12-instruction ALU-only test program.
| Design | Stages | CPI (eff.) | IPC (eff.) | Throughput @ 100 MHz | Fmax (est.) |
|---|---|---|---|---|---|
| Single-cycle | 1 | 1.00 | 1.00 | 100 MIPS | ~110–140 MHz |
| Multicycle FSM | 4 (FSM) | 4.00 | 0.25 | 25 MIPS | ~150 MHz |
| 4-stage pipeline | 4 (IF/ID/EX/WB) | 1.25 | 0.80 | 80 MIPS | ~133 MHz |
| 2-wide superscalar | 4 (IF/ID/EX/WB) | 0.75 | 1.33 | 133 MIPS | — |
Eval2 Resource Utilization (Artix-7 xc7a35tcpg236-1):
| Design | LUTs | FFs | BRAM |
|---|---|---|---|
| Single-cycle | 487 | 32 | 0 |
| Multicycle FSM | 517 | 200 | 0 |
| 4-stage pipeline | 570 | 216 | 1 |
Full details: see eval2_report.txt and comparison.txt.
| Issue | Affected Level | Description |
|---|---|---|
| Timing not met at 100 MHz | L2, L3, L4 board builds | Critical path through forwarding + ALU + branch-taken → pipeline flush. Actual Fmax: ~79 MHz (L2), ~73 MHz (L3/L4). No functional impact in simulation. For deployment, reduce clock to 50–70 MHz via a clock divider or MMCM. |
Hardcoded Windows path in instr_mem.v |
l4_superscalar/src/instr_mem.v |
The default MEM_FILE parameter uses an absolute Windows path. Must be overridden at instantiation on Linux/Mac. |
| No board support for superscalar | l4_superscalar/ |
No boolean_top.v, no XDC, no synthesis TCL. Simulation only. |
| Branch prediction: assume not-taken | All pipeline levels | All branches incur a 2-cycle penalty when taken. No dynamic predictor. |
| Word-only memory in L2–L4 | L2, L3, L4 | Data memory is word-addressed; no byte or halfword access (LB/LH/SB/SH not supported). Only l4_superscalar implements sub-word memory access. |