Skip to content

j-silv/autoppa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoPPA - automatic RTL code optimization

Introduction

This repository introduces AutoPPA - an AI agent which optimizes RTL code for Power, Performance, Area (PPA). In addition to the agent, a benchmark containing 5 optimization tasks is included. Finally, several baselines are provided to see how the AI's performance compares across the benchmark tasks.

Pre-requisites

Download Icarus Verilog for simulation.

sudo apt install iverilog

Download Yosys for synthesis.

sudo apt install yosys

Download OpenSTA for power estimation. This needs to be built from source because the distribution package includes a bug which stops Yosys designs from being parsed. I used docker for this part, and thus this project assumes you have Docker installed.

git clone https://github.com/parallaxsw/OpenSTA.git
cd OpenSTA
docker build --file Dockerfile.ubuntu22.04 --tag opensta .
# docker run v $PWD:/autoppa opensta # all sta commands will be ran like this

Next, clone this repository and install it as a pip package

git clone https://github.com/j-silv/autoppa.git
cd autoppa
pip install -e .

Finally, for power analysis, you'll also need to download and extract the SkyWater cell libraries.

wget -P benchmark https://github.com/parallaxsw/OpenSTA/raw/refs/heads/master/examples/sky130hd_tt.lib.gz
gzip -d benchmark/sky130hd_tt.lib.gz

Benchmark

The benchmark uses RTL code gathered from the PicoRV32 project. This repo is used because the core is configurable with respect to PPA, and thus useful benchmarks can be created from non-optimized (one particular configuration) vs. optimized RTL (a different configuration).

Task breakdown

Task 1. Increase performance of the picorv32_pcpi_mul module

  • Reference: picorv32_pcpi_mul module in the picorv32.v top-level module
  • Optimized: picorv32_pcpi_fast_mul module which is enabled with the ENABLE_FAST_MUL configuration flag

Task 2. Reduce the area of the picorv32_pcpi_mul module

  • Reference: re-used picorv32_pcpi_fast_mul module (Task #1) which has an improved performance, but higher area usage
  • Optimized: re-used picorv32_pcpi_mul module (Task #1)

Task 3. Increase performance of the picorv32_pcpi_div module

  • Reference: picorv32_pcpi_div module in the picorv32.v top-level module
  • Optimized: Faster divide module generated by iteratively prompting ChatGPT

Task 4. Reduce the area of the picorv32_pcpi_div module

  • Reference: re-used optimized picorv32_pcpi_div module (Task #3) which has improved performance, but higher area usage
  • Optimized: re-used picorv32_pcpi_div module (Task #3)

Task 5. Reduce the power of the picorv32_pcpi_div module

  • Reference: re-used picorv32_pcpi_div module (Task #3)
  • Optimized: Power-efficient divide module generated by iteratively prompting ChatGPT

Running a benchmark

Let's run through an example with Task #1.

First, we compile and simulate the Verilog code:

Simulation (performance)

$ autoppa sim 1 baseline/reference/task1.v

> The simulation passed successfully
> Execution time (ns) == 3100

Next we synthesize the design to get area metrics:

Synthesize (area)

$ autoppa synth baseline/reference/task1.v

> The synthesis completed successfully
> Area (number of cells) == 1550

After running both simulation and synthesis, you can extract power information:

Power

$ autoppa power baseline/reference/task1.v

> The power analysis completed successfully
> Power (mW) == 3.61

You can run the previous steps with one command by using the benchmark command:

$ autoppa benchmark 1 reference
==========================================================================
Task number: 1
Description: Increase the speed (performance) of this multiply module
Metric: performance
Reference performance: 3100 ns
==========================================================================

-------- BASELINE reference --------

The simulation passed successfully
Execution time (ns) == 3100

The synthesis completed successfully
Area (number of cells) == 1516

The power analysis completed successfully
Power (mW) == 3.7100

Baseline

The following baselines are considered for the benchmark:

  1. Reference design (trivial)
  2. Optimized design (mix between Pico configuration flags/ChatGPT iteratively prompted)
  3. Yosys optimization command-line flags (not yet implemented)

Agent

If you run the 'agent' step, the AI will attempt to achieve one of the specified optimization tasks. For example:

autoppa agent 1

Server

Another way to interact with the agent is to set up a Streamlit server. An additional benefit of this is that the evolution of the optimization task is clearer.

make serve

Acknowledgements

About

PPAgent for RTL code optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages