This repository implements an enhanced Branch Prediction Unit for the CVA6 (formerly Ariane) open-source RISC-V core.
The project extends the CVA6 frontend architecture by integrating high-performance GShare and TAGE (Tagged Geometric History Length) branch predictors.
The predictors are evaluated using both microbenchmarks and application benchmarks to analyze their impact on branch prediction accuracy and overall CPU performance.
The goal is to improve processor performance by increasing IPC (Instructions Per Cycle) and reducing branch misprediction penalties.
This project extends the branch prediction unit of the CVA6 open-source RISC-V processor by implementing GShare and TAGE branch predictors.
The implementation focuses primarily on modifications to the frontend stage of the processor pipeline.
For detailed architectural specifications of the TAGE branch predictor, refer to the README.md in the core directory.
For details about the original architecture, refer to the official CVA6 repository.
Original CVA6 repository:
https://github.com/openhwgroup/cva6
The branch predictor implementation required modifications to several frontend and core pipeline components:
core/cva6.svcore/branch_unit.svcore/include/ariane_pkg.svcore/include/build_config_pkg.svcore/include/config_pkg.svcore/frontend/frontend.svcore/frontend/instr_queue.svcore/frontend/GShareTable.svcore/frontend/TageBaseTable.svcore/frontend/TageTable.sv
Branch predictor parameters can be configured by modifying the following files:
core/include/cv32a60x_config_pkg.svcore/include/cv32a65x_config_pkg.sv
Currently, the implementation supports the cv32a60x and cv32a65x configurations.
Support for additional CVA6 configurations will be extended in future work.
| Branch Predictor | Num. of Tables | Num. of Entries | Bits per Entry | Total Cost (Bytes) |
|---|---|---|---|---|
| Baseline (BHT) | 1 | 8192 | 2-bit counter + 1-bit valid | 3,072 B |
| GShare | 1 | 8192 | 2-bit counter + 1-bit valid | 3,072 B |
| TAGE | 4 | 64 / 256 / 512 / 1024 | 2-bit counter/useful + 1-bit valid + tag (0/4/6/8 bit) | 2,696 B |
Two types of benchmarks are used for evaluation.
Synthetic workloads designed to stress specific branch behavior patterns.
-
Aliasing
Evaluates how different predictors handle destructive aliasing caused by multiple branches mapping to the same prediction entries. -
Alternating Pattern
Tests the ability of predictors to learn regular alternating branch patterns (taken/not-taken). -
Correlated Branch
Measures how well predictors capture inter-branch correlations using global history information. -
Correlated Periodic
Evaluates the effectiveness of predictors in recognizing long periodic branch patterns generated from multiple correlated conditions.
Real workloads with control-flow intensive behavior.
-
Binary Search
Evaluates prediction accuracy for nested conditional branches commonly found in search algorithms. -
N-Queens
Generates complex branch behavior due to recursive backtracking and deep control-flow exploration. -
QuickSort
Tests predictor performance on recursive divide-and-conquer algorithms with data-dependent branching.
The figure shows the normalized IPC improvement of different branch predictors across microbenchmarks and application benchmarks.
GShare and TAGE significantly improve performance compared to the baseline BHT, particularly for structured branch patterns such as alternating and correlated branches.
The effectiveness of TAGE can vary depending on the hashing scheme and the number of tag/index bits, which influence the level of aliasing in prediction tables.
The figure shows the branch miss rate across different predictors.
GShare and TAGE significantly outperform the baseline BHT for structured branch patterns, demonstrating the benefits of history-based branch prediction mechanisms.
The performance gap varies across applications due to differences in branch patterns and aliasing behavior within the predictor tables.
To evaluate prediction efficiency relative to hardware overhead, we introduce a hardware-cost-normalized MPKI metric.
Cost-Normalized MPKI = (Branch Misses / Total Instructions) × 1000 / Hardware Cost (Bytes)
This metric highlights the performance of branch predictors relative to its hardware cost.
| Branch Predictor | Aliasing | Alternating | Correlated Branch | Correlated Periodic | Binary Search | N-Queens | QuickSort |
|---|---|---|---|---|---|---|---|
| Baseline (BHT) | 15.93 | 14.14 | 11.41 | 10.60 | 5.79 | 3.43 | 0.58 |
| GShare | 3.15 | 0.0028 | 0.0081 | 2.57 | 5.03 | 3.59 | 0.24 |
| TAGE | 11.48 | 0.0032 | 0.00074 | 0.5106 | 6.35 | 4.37 | 0.41 |
To run the branch prediction benchmarks, execute the following commands:
export DV_SIMULATORS=veri-testharness
cd verif/sim
./TageBenchmarks.sh