Skip to content

rignitc/CarbonX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RISC-V RV32I CPU — Progressive Implementation

A progressive, from-scratch implementation of a RISC-V RV32I processor in Verilog, developed across multiple evaluation levels. The project begins with a single-cycle baseline and advances through a 5-stage pipelined design, ultimately culminating in a 2-wide dual-issue superscalar core. All designs target the Xilinx Spartan-7 50T FPGA (xc7s50csga324-1) using Vivado 2025.2.


Table of Contents

  1. Repository Layout
  2. Architecture Overview
  3. ISA Coverage Summary
  4. Level-by-Level Description
  5. Pipeline Architecture
  6. Hazard Handling
  7. Module Reference
  8. Simulation
  9. Synthesis and Implementation
  10. Hardware Deployment — Board I/O
  11. Synthesis Results
  12. Eval2 Architecture Comparison
  13. Known Issues

1. Repository Layout

riscv/
├── src/                        # Root: standalone single-cycle RV32I core
├── sim/                        # Root: single-cycle simulation hex + testbench
├── mem/                        # Root: output from gen_hex.py
├── gen_hex.py                  # Level-0 test program generator (19 instructions)
├── comparison.txt              # Eval2 architecture comparison table
├── eval2_report.txt            # Eval2 detailed synthesis + simulation report
│
├── eval2/                      # Historical: single-cycle baseline (eval2)
├── eval2_multicycle/           # Historical: FSM multicycle (eval2)
├── eval2_pipeline/             # Historical: 4-stage pipeline IF/ID/EX/WB (eval2)
├── eval2_superscalar/          # Historical: early 2-wide superscalar attempt (eval2)
├── eval2_complete/             # Historical: Vivado project version (eval2)
├── eval1_presentation/         # LaTeX/PDF presentation slides
│
├── level2/pipeline/            # 5-stage pipeline: R/I/load/store/branch
│   ├── src/                    # 19 Verilog source files
│   ├── sim/                    # Testbench, program.hex, run.tcl
│   ├── synth.tcl               # Synthesis script (riscv_core standalone)
│   ├── impl.tcl                # Implementation script (riscv_core standalone)
│   ├── board_impl_boolean.tcl  # Full boolean_top implementation + program
│   ├── boolean.xdc             # Boolean Board constraints
│   ├── arty_s7.xdc             # Arty S7-50 constraints
│   ├── impl_out/               # Synthesis reports: riscv_core standalone
│   └── board_out_boolean/      # Synthesis reports: boolean_top full build
│
├── level3/pipeline/            # Level2 + JAL/JALR/LUI/AUIPC
│   ├── src/                    # Verilog source files
│   ├── sim/                    # Testbenches, program.hex, program_new.hex
│   ├── assemble.py             # Python RV32I assembler
│   ├── boolean.xdc
│   ├── synth.tcl
│   └── board_out_boolean/      # Synthesis reports
│
├── level4/pipeline/            # Level3 + comprehensive 69-instruction test
│   ├── src/                    # Verilog source files (identical RTL to level3)
│   ├── sim/                    # Testbench, 69-instruction program.hex
│   ├── assemble.py             # Enhanced assembler (all RV32I instructions)
│   ├── boolean.xdc
│   ├── synth.tcl
│   └── board_out_boolean/      # Synthesis reports
│
└── l4_superscalar/             # 2-wide dual-issue superscalar (simulation only)
    ├── src/                    # Monolithic riscv_core.v + supporting modules
    └── sim/                    # Testbench, program.hex

2. Architecture Overview

The project follows a linear progression:

Level 0 (root src/)       Single-cycle — all instructions in one clock
        |
Eval2 variants            Single-cycle / multicycle FSM / 4-stage pipeline /
        |                 early superscalar (historical, for comparison)
        |
Level 2 (5-stage)         IF → ID → EX → MEM → WB
        |                 R/I-ALU, LW, SW, all 6 branch types
        |
Level 3 (5-stage)         Level2 + LUI, AUIPC, JAL, JALR
        |
Level 4 (5-stage)         Level3 RTL, 69-instruction comprehensive test suite
        |
L4 Superscalar            2-wide dual-issue, 4-stage, full RV32I + byte/half memory

All pipelined designs (level2–level4) share the same 5-stage pipeline skeleton with a common hazard/forwarding architecture. The superscalar is a separate, monolithic design.


3. ISA Coverage Summary

Instruction Group Level 2 Level 3 Level 4 L4 Superscalar
R-type (ADD/SUB/AND/OR/XOR/SLL/SRL/SRA/SLT/SLTU) Yes Yes Yes Yes
I-type ALU (ADDI/ANDI/ORI/XORI/SLTI/SLTIU/SLLI/SRLI/SRAI) Yes Yes Yes Yes
LW Yes Yes Yes Yes
SW Yes Yes Yes Yes
LB / LH (byte/halfword loads) No No No Yes
SB / SH (byte/halfword stores) No No No Yes
BEQ / BNE Yes Yes Yes Yes
BLT / BGE / BLTU / BGEU Yes Yes Yes Yes
LUI No Yes Yes Yes
AUIPC No Yes Yes Yes
JAL No Yes Yes Yes
JALR No Yes Yes Yes
FENCE / ECALL / EBREAK No No No Decoded (NOP)

4. Level-by-Level Description

Root src/ — Single-Cycle Core

A standalone single-cycle RV32I processor in src/riscv_core.v. All seven source files are self-contained (no stage hierarchy):

File Description
riscv_core.v Top-level single-cycle core; PC, decode, execute, memory, writeback all in one module
alu.v 32-bit ALU
control_unit.v Combinational control decoder
imm_gen.v Immediate generator (all types)
regfile.v 32×32 register file
instr_mem.v Synchronous instruction memory ($readmemh)
data_mem.v Data memory

The test program in sim/program.hex is generated by gen_hex.py (19 instructions: ADDI, ADD, SUB, AND, OR, XOR, SLL, SRLI, SLT, SLTU, ANDI, ORI, XORI, SLLI, SRLI, SRAI, JAL x0 halt).


Eval2 Historical Variants

Four designs implemented for comparative evaluation (eval2/, eval2_multicycle/, eval2_pipeline/, eval2_superscalar/). All target the Artix-7 xc7a35tcpg236-1 and use a 12-instruction ALU test program. See Section 12 for a full comparison table.


Level 2 — 5-Stage Pipeline (Baseline)

Path: level2/pipeline/

The first fully pipelined design. Implements a classic 5-stage RISC pipeline with full forwarding and hazard detection.

Supported ISA:

  • All 10 R-type: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU
  • All 9 I-type ALU: ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI
  • LW (word load)
  • SW (word store)
  • All 6 branches: BEQ, BNE, BLT, BGE, BLTU, BGEU

Immediate types supported: I-type, S-type, B-type (no U or J).

Boards: Boolean Board (boolean.xdc) and Arty S7-50 (arty_s7.xdc). Both use the same device xc7s50csga324-1 (Spartan-7 50T).

Test program (sim/program.hex): 29 instructions. Exercises all supported instruction types, including a loop that sums 1–5 (expected result: 15 in x10).

Source files:

File Description
riscv_core.v Top-level: instantiates all pipeline stages
if_stage.v Instruction fetch; PC register and IMem read
if_id_reg.v IF/ID pipeline register
id_stage.v Instruction decode; regfile read
id_ex_reg.v ID/EX pipeline register
ex_stage.v Execute; ALU, forwarding muxes, branch logic
ex_mem_reg.v EX/MEM pipeline register
mem_stage.v Memory access; data memory read/write
mem_wb_reg.v MEM/WB pipeline register
wb_stage.v Writeback; selects ALU result or load data
hazard_fwd_unit.v Load-use stall detection + forwarding control
alu.v 32-bit ALU (11 operations)
decoder.v Combinational instruction decoder
imm_gen.v Immediate sign-extension (I/S/B types)
regfile.v 32×32 register file, write-through
instr_mem.v Synchronous instruction memory
data_mem.v Synchronous word-addressed data memory
boolean_top.v Boolean Board top-level wrapper
arty_s7_top.v Arty S7-50 top-level wrapper
seg7_ctrl.v Time-multiplexed 4-digit hex display driver

Level 3 — Full Control Flow

Path: level3/pipeline/

Extends level2 with the remaining non-memory, non-ALU base instructions: LUI, AUIPC, JAL, JALR. This completes the control-flow subset of RV32I.

New in level3 vs level2:

Feature Detail
imm_gen.v Adds U-type (lui/auipc) and J-type (jal) immediates
decoder.v Adds output signals: jump, jalr, lui, auipc, link
id_ex_reg.v Carries 5 additional control bits through the pipeline
ex_stage.v alu_a mux: selects PC for AUIPC; alu_result mux: selects PC+4 for link (JAL/JALR); computes jalr_target = (rs1+imm) & ~1, jal_target = PC+imm

assemble.py: Full Python RV32I assembler. Supports all R/I/S/B/U/J instruction formats, forward and backward label references, and all 6 branch types. Assembles source text directly to sim/program.hex.

Test programs:

  • sim/program.hex: 17-instruction nested loop / function-call test via JAL
  • sim/program_new.hex: 69-instruction comprehensive test covering all level3 instructions (assembled by assemble.py)

Testbenches:

  • sim/tb_riscv_core.v: Stack operations, nested loops, JAL call/return
  • sim/tb_new_program.v: Elaborate per-instruction pass/fail checking for all level3 ISA

Board: Boolean Board only.


Level 4 — Comprehensive Verification

Path: level4/pipeline/

The RTL is identical to level3 — same source files, same module hierarchy, same synthesis results. Level4 is distinguished by its enhanced assemble.py and test suite, which provide exhaustive coverage of all implemented instructions.

Test program (sim/program.hex): 69 instructions assembled by assemble.py. Covers:

  • LUI: loads 0x12345000 into x3
  • AUIPC: loads PC + 0xABCDE000 into x4 (expected: 0xABCDE008)
  • All 10 R-type operations
  • All 9 I-type ALU operations
  • SW + LW round-trip (pattern: 0xDEAD)
  • All 6 branch types (counter incremented once per taken branch, final expected: 6)
  • JAL + JALR with stack push/pop via SP (x2)

Testbench (sim/tb_riscv_core.v): Checks each result with explicit $display/$fatal assertions. Final pass message confirms all checks passed.

Board: Boolean Board only.


L4 Superscalar — 2-Wide Dual-Issue

Path: l4_superscalar/

A 2-wide dual-issue superscalar processor. This is a simulation-only design (no board top, no XDC, no synthesis TCL). Architecture differs significantly from the single-issue levels.

Key design decisions:

Feature Detail
Fetch Fetches 2 instructions per cycle (instr0, instr1) from instr_mem
Issue gate (dual_ok) Issues pair only when: both slots valid; neither is a control instruction (branch/jal/jalr); no intra-group RAW hazard (slot0.rd == slot1.rs1 or slot1.rs2); not both memory ops; no FENCE
PC advance PC + 8 when dual_ok, PC + 4 when single-issue, hold on stall, redirect on branch taken
Pipeline stages 4 stages: IF/ID → EX → MEM → WB (pairs tracked as e0/e1, m0/m1, w0/w1)
Dual ALUs u_alu0 + u_alu1 operating in parallel
Forwarding 2-bit fa0/fb0/fa1/fb1: 00=none, 01=MEM slot0, 10=MEM slot1, 11=WB either
Register file Dual-write port (we0/we1), 4 read ports (rs1_0/rs2_0/rs1_1/rs2_1), write-through on all read ports
Data memory Byte-addressable with width[1:0] and mem_unsigned for LB/LH/LW/SB/SH/SW
control_unit.v Extended decoder: adds mem_to_reg, fence, ecall, ebreak, mem_width[2:0], mem_unsigned

Note: instr_mem.v contains a hardcoded Windows-style default path for MEM_FILE. This must be overridden at instantiation on Linux (pass MEM_FILE as a parameter or edit the default).

Test program (sim/program.hex): 17-instruction simple loop and function-call sequence. Testbench prints a full per-cycle pipeline trace (both slots, EX/MEM/WB stages) and checks x1–x19 against expected values.


5. Pipeline Architecture

All level2/3/4 designs share this 5-stage pipeline:

         ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐
  clk ──>│   IF   │───>│   ID   │───>│   EX   │───>│  MEM   │───>│   WB   │
         └────────┘    └────────┘    └────────┘    └────────┘    └────────┘
              │    if_id   │    id_ex   │    ex_mem  │    mem_wb  │
              │   ══════>  │   ══════>  │   ══════>  │   ══════>  │
              │            │            │            │            │
         PC + IMem    Decode +      ALU +        DMem         Regfile
                       RegRead     Branch                      Write
                                   Resolve

Pipeline registers: if_id_reg, id_ex_reg, ex_mem_reg, mem_wb_reg — all synchronous, reset-to-NOP on flush.

Key microarchitectural properties:

Property Value / Description
Clock 100 MHz (10 ns period), single sys_clk domain
Reset Active-low synchronous (rst_n)
Instruction memory Synchronous read, word-addressed, $readmemh initialized
Data memory Synchronous write, combinatorial read, word-addressed via addr[31:2]
Register file Write-through: WB write visible to ID-stage reads in same cycle
Branch resolution EX stage — 2-cycle penalty; IF and ID stages flushed on redirect
Forwarding EX→EX (from EX/MEM register) and MEM→EX (from MEM/WB register)
Debug port dbg_addr[4:0] / dbg_data[31:0] on regfile for board inspection

ALU operations (4-bit control):

Code Operation Code Operation
0000 ADD 0101 SRL
0001 SUB 0110 SLT
0010 AND 0111 SLTU
0011 OR 1000 SRA
0100 XOR 1001 SLL
1010 PASS_B (for LUI)

6. Hazard Handling

Implemented in hazard_fwd_unit.v (instantiated inside ex_stage.v).

Forwarding

Forward paths resolve RAW hazards without stalling when the producing instruction has already computed its result:

fwd_a / fwd_b Meaning
2'b00 No forwarding — use register file output
2'b01 Forward from EX/MEM register (one cycle ago)
2'b10 Forward from MEM/WB register (two cycles ago)

Forwarding conditions (for fwd_afwd_b is symmetric):

// EX/MEM forward (higher priority)
if (ex_mem_reg_write && ex_mem_rd != 0 && ex_mem_rd == id_ex_rs1)
    fwd_a = 2'b01;
// MEM/WB forward
else if (mem_wb_reg_write && mem_wb_rd != 0 && mem_wb_rd == id_ex_rs1)
    fwd_a = 2'b10;

Load-Use Stall

A 1-cycle bubble is inserted when a load is immediately followed by a dependent instruction. Detected in riscv_core.v:

stall = id_ex_mem_read && (id_ex_rd != 0) &&
        (id_ex_rd == id_rs1 || id_ex_rd == id_rs2);

On stall: PC is held, IF/ID register is held, ID/EX register is flushed to NOP.

Branch Penalty

Branches are resolved in the EX stage. On a taken branch:

  • pc_redirect signal asserted, pc_target driven to branch/jump target
  • IF and ID pipeline registers flushed (2-cycle penalty)
  • Not-taken branches incur no penalty (no branch prediction — assume not-taken)

7. Module Reference

The following table describes every module used in the level2/3/4 pipeline (with notes on level3/4 additions):

Module Instantiated as Description
riscv_core u_core Pipeline top-level; wires all stages; contains stall logic
if_stage u_if PC register, PC+4/redirect mux, instruction memory read
if_id_reg u_if_id IF/ID pipeline register; flush-to-NOP on redirect or stall
id_stage u_id Decoder, immediate generator, register file read
id_ex_reg u_id_ex ID/EX pipeline register; carries control + data; has synchronous reset signal flush_decode
ex_stage u_ex ALU + forwarding muxes + branch condition evaluation; level3+ adds jump/lui/auipc muxes
ex_mem_reg u_ex_mem EX/MEM pipeline register
mem_stage u_mem Data memory instantiation; LW/SW
mem_wb_reg u_mem_wb MEM/WB pipeline register
wb_stage u_wb Writeback mux: ALU result or load data
hazard_fwd_unit u_haz Forwarding logic (inside ex_stage); stall signal to core
alu u_alu 32-bit combinational ALU; 4-bit control
decoder u_dec Combinational control signal generation from opcode/funct3/7
imm_gen u_imm Sign-extended immediate; level3+ adds U and J types
regfile u_rf 32×32 register file; x0 hardwired to zero; write-through
instr_mem u_imem Synchronous ROM; parameterized MEM_FILE
data_mem u_dmem Synchronous RAM; word-addressed
boolean_top (top) Boolean Board wrapper: clock buffer, reset, seg7_ctrl, switch/LED I/O
arty_s7_top (top) Arty S7-50 wrapper (level2 only)
seg7_ctrl u_seg7 4-digit 7-segment display driver; time-multiplexed

8. Simulation

Prerequisites

  • Icarus Verilog (iverilog / vvp)
  • (Optional) GTKWave for VCD waveform viewing

Level 2 — Quick Start

cd level2/pipeline/sim

# Compile
iverilog -o sim.out -s tb_riscv_core \
    tb_riscv_core.v \
    ../src/riscv_core.v \
    ../src/if_stage.v \
    ../src/if_id_reg.v \
    ../src/id_stage.v \
    ../src/id_ex_reg.v \
    ../src/ex_stage.v \
    ../src/ex_mem_reg.v \
    ../src/mem_stage.v \
    ../src/mem_wb_reg.v \
    ../src/wb_stage.v \
    ../src/hazard_fwd_unit.v \
    ../src/alu.v \
    ../src/decoder.v \
    ../src/imm_gen.v \
    ../src/regfile.v \
    ../src/instr_mem.v \
    ../src/data_mem.v

# Run (program.hex must be in the working directory or MEM_FILE path correct)
vvp sim.out

# View waveforms
gtkwave tb_riscv_core.vcd

Or use the provided TCL script from Vivado:

source sim/run.tcl

Level 3 / Level 4

Same flow. Level3 has two testbenches:

# Basic JAL/stack test
iverilog -o sim.out -s tb_riscv_core  tb_riscv_core.v  [sources...]
# Comprehensive all-instruction test
iverilog -o sim.out -s tb_new_program tb_new_program.v [sources...]

Assembling a New Program

The assemble.py in level3 and level4 assembles RV32I assembly source to hex:

cd level3/pipeline   # or level4/pipeline
python3 assemble.py  # reads inline assembly, writes sim/program.hex

To modify the test program, edit the assembly string inside assemble.py and re-run.

L4 Superscalar

cd l4_superscalar/sim

iverilog -o sim.out -s tb_riscv_core \
    tb_riscv_core.v \
    ../src/riscv_core.v \
    ../src/control_unit.v \
    ../src/alu.v \
    ../src/regfile.v \
    ../src/instr_mem.v \
    ../src/data_mem.v

vvp sim.out

Note: Edit instr_mem.v to fix the MEM_FILE default path (currently a Windows absolute path) or pass MEM_FILE as a parameter.


9. Synthesis and Implementation

Toolchain

  • Vivado 2025.2 (lin64), Build 6299465
  • Device: xc7s50csga324-1 (Spartan-7 50T)
  • Boolean Board clock: 100 MHz, pin F14, LVCMOS33
  • Arty S7 clock: 100 MHz, pin P14

Level 2 — Standalone Core Synthesis

# In Vivado TCL console or batch mode:
source level2/pipeline/synth.tcl   # Synthesize riscv_core only
source level2/pipeline/impl.tcl    # Implement riscv_core (no board I/O)

Reports land in level2/pipeline/impl_out/.

Level 2 — Full Board Build (Boolean Board)

source level2/pipeline/board_impl_boolean.tcl
# This script also calls program_board.tcl to flash the bitstream

Reports land in level2/pipeline/board_out_boolean/.

Level 3 / Level 4 — Board Build

source level3/pipeline/synth.tcl   # or level4/pipeline/synth.tcl

Reports land in levelN/pipeline/board_out_boolean/.


10. Hardware Deployment — Board I/O

Boolean Board (boolean_top.v)

All three levels (2, 3, 4) use an identical boolean_top.v wrapper.

Signal Direction Pins / Width Description
clk Input F14 100 MHz system clock
btn[0] Input 1 bit Active-high reset (synchronous)
sw[4:0] Input 5 bits Register select — choose which x0–x31 register to display
led[15:0] Output 16 bits Lower 16 bits of the selected register
seg[6:0] Output 7 bits 7-segment cathode signals (active-low)
an[3:0] Output 4 bits 7-segment anode enables (active-low, time-multiplexed)
dp Output 1 bit Decimal point (driven low / unused)

Usage:

  1. Program the board with the generated bitstream
  2. Press btn[0] to reset the CPU; it will begin executing from address 0
  3. Set sw[4:0] to the register number you want to inspect (e.g., 01010 = x10)
  4. led[15:0] shows the lower 16 bits of that register immediately
  5. The 4-digit hex display shows the full 32-bit value of the selected register

Segment display encoding: seg7_ctrl cycles through digits 0–3 at a frequency derived from the 100 MHz clock (typically ~1 kHz digit refresh). Each digit displays one hex nibble (0–F).

Arty S7-50 (arty_s7_top.v)

Level2 only. Same functional mapping as boolean_top but with Arty S7 pin assignments.

Signal Direction Pin Description
clk Input P14 100 MHz system clock
btn[0] Input Reset
sw[3:0] Input Register select (lower 4 bits)
led[3:0] Output Lower 4 bits of selected register

11. Synthesis Results

Device Capacity Reference (xc7s50csga324-1, Spartan-7 50T)

Resource Total Available
Slice LUTs 32,600
Slice Registers 65,200
Block RAM 75 (36Kb each)
DSP48E1 120
IOBs 210

Level 2 — Standalone riscv_core (no board I/O, impl_out/)

Note: Most logic was optimized away (constant inputs); these numbers reflect the pruned implementation, not the full core.

Resource Used Available Utilization
Slice LUTs 2 32,600 < 0.01%
Slice Registers 30 65,200 0.05%
Block RAM 0 75 0%

Timing (post-route, standalone): WNS = +7.161 ns — timing met. Critical path: PC adder carry chain, 9 logic levels, 2.839 ns.

Power (post-route, standalone): Total = 0.101 W, Dynamic = 0.030 W.


Board Builds — Boolean Board (board_out_boolean/)

Metric Level 2 Level 3 Level 4
Slice LUTs (total) 1,463 1,666 1,666
— LUT as Logic 885 1,088 1,088
— LUT as Dist. RAM 578 578 578
CARRY4 40 66 66
Slice Registers (FFs) 344 437 437
F7/F8 Muxes 384 385 385
Slices 465 529 529
Block RAM 0 0 0
IOBs 53 (25.2%) 53 (25.2%) 53 (25.2%)
BUFGCTRL 1 1 1

LUT utilization: L2 = 4.49%, L3/L4 = 5.11% of device.

Level2 → Level3/4 increase: +203 LUTs (+14%), +93 FFs (+27%). This overhead comes from the 5 additional control signals (jump, jalr, lui, auipc, link) propagating through the ID/EX pipeline register, the EX-stage mux tree, and the extended imm_gen.

Level3 vs Level4: Identical RTL, identical synthesis results. The difference is only in the test program and testbench.


Timing — Boolean Board (post-route, 100 MHz constraint)

Level WNS (ns) TNS (ns) Failing Endpoints Status
Level 2 -2.306 -335.187 162 / 6,592 NOT MET
Level 3 -3.407 -743.564 256 / 6,802 NOT MET
Level 4 -3.407 -743.564 256 / 6,802 NOT MET

All hold-time constraints are met (WHS > 0 in all cases).

Level 2 critical path: id_ex.out_rs2[0] → forwarding unit → ALU (SRL carry chain, bits [4–8]) → zero flag → pc_redirect flop reset. 12 logic levels, 11.811 ns data path delay, 9.805 ns routing. Slack = -2.306 ns → actual Fmax ≈ 79.5 MHz.

Level 3/4 critical path: ex_mem.out_rd[2] → forwarding unit (fwd_a) → ALU input mux → ALU bit [26] carry chain → zero flag → pc_redirectid_ex pipeline register reset. 14 logic levels, 12.901 ns data path delay, 10.709 ns routing. Slack = -3.407 ns → actual Fmax ≈ 73.5 MHz.

Why timing fails: The critical path passes through: (1) the MEM/WB or EX/MEM forwarding comparators, (2) ALU operand selection muxes, (3) the 32-bit ALU carry chain (especially for shift/compare operations that the synthesizer cannot map to CARRY4 efficiently), (4) the zero/branch-taken combinational logic, and (5) the conditional flush/redirect path to the pipeline register reset pin — all in a single cycle. The routing component (83% of path delay) indicates the design is spread across the fabric, which inflates net delays significantly. Reducing the clock to ~70 MHz or retiming the branch-taken → flush path would resolve violations.


Power — Boolean Board (post-route, 100 MHz, typical process)

Metric Level 2 Level 3 Level 4
Total On-Chip Power (W) 0.078 0.132 0.132
Dynamic Power (W) 0.006 0.061 0.061
Device Static (W) 0.072 0.072 0.072
Junction Temperature (C) 25.4 25.7 25.7
Confidence Level Low Low Low

The low confidence level is expected (no simulation activity file provided). The dominant dynamic power consumer in level3/4 is I/O (0.041 W at Vcco33), followed by signals (0.010 W) and slice logic (0.005 W).


12. Eval2 Architecture Comparison

Historical designs implemented for direct architectural comparison. All run at 100 MHz reference; device: Artix-7 xc7a35tcpg236-1; 12-instruction ALU-only test program.

Design Stages CPI (eff.) IPC (eff.) Throughput @ 100 MHz Fmax (est.)
Single-cycle 1 1.00 1.00 100 MIPS ~110–140 MHz
Multicycle FSM 4 (FSM) 4.00 0.25 25 MIPS ~150 MHz
4-stage pipeline 4 (IF/ID/EX/WB) 1.25 0.80 80 MIPS ~133 MHz
2-wide superscalar 4 (IF/ID/EX/WB) 0.75 1.33 133 MIPS

Eval2 Resource Utilization (Artix-7 xc7a35tcpg236-1):

Design LUTs FFs BRAM
Single-cycle 487 32 0
Multicycle FSM 517 200 0
4-stage pipeline 570 216 1

Full details: see eval2_report.txt and comparison.txt.


13. Known Issues

Issue Affected Level Description
Timing not met at 100 MHz L2, L3, L4 board builds Critical path through forwarding + ALU + branch-taken → pipeline flush. Actual Fmax: ~79 MHz (L2), ~73 MHz (L3/L4). No functional impact in simulation. For deployment, reduce clock to 50–70 MHz via a clock divider or MMCM.
Hardcoded Windows path in instr_mem.v l4_superscalar/src/instr_mem.v The default MEM_FILE parameter uses an absolute Windows path. Must be overridden at instantiation on Linux/Mac.
No board support for superscalar l4_superscalar/ No boolean_top.v, no XDC, no synthesis TCL. Simulation only.
Branch prediction: assume not-taken All pipeline levels All branches incur a 2-cycle penalty when taken. No dynamic predictor.
Word-only memory in L2–L4 L2, L3, L4 Data memory is word-addressed; no byte or halfword access (LB/LH/SB/SH not supported). Only l4_superscalar implements sub-word memory access.

About

Implementation of RV32I processor core in RISC-V architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors