This repository introduces AutoPPA - an AI agent which optimizes RTL code for Power, Performance, Area (PPA). In addition to the agent, a benchmark containing 5 optimization tasks is included. Finally, several baselines are provided to see how the AI's performance compares across the benchmark tasks.
Download Icarus Verilog for simulation.
sudo apt install iverilog
Download Yosys for synthesis.
sudo apt install yosys
Download OpenSTA for power estimation. This needs to be built from source because the distribution package includes a bug which stops Yosys designs from being parsed. I used docker for this part, and thus this project assumes you have Docker installed.
git clone https://github.com/parallaxsw/OpenSTA.git
cd OpenSTA
docker build --file Dockerfile.ubuntu22.04 --tag opensta .
# docker run v $PWD:/autoppa opensta # all sta commands will be ran like this
Next, clone this repository and install it as a pip package
git clone https://github.com/j-silv/autoppa.git
cd autoppa
pip install -e .
Finally, for power analysis, you'll also need to download and extract the SkyWater cell libraries.
wget -P benchmark https://github.com/parallaxsw/OpenSTA/raw/refs/heads/master/examples/sky130hd_tt.lib.gz
gzip -d benchmark/sky130hd_tt.lib.gz
The benchmark uses RTL code gathered from the PicoRV32 project. This repo is used because the core is configurable with respect to PPA, and thus useful benchmarks can be created from non-optimized (one particular configuration) vs. optimized RTL (a different configuration).
- Reference:
picorv32_pcpi_mulmodule in thepicorv32.vtop-level module - Optimized:
picorv32_pcpi_fast_mulmodule which is enabled with theENABLE_FAST_MULconfiguration flag
- Reference: re-used
picorv32_pcpi_fast_mulmodule (Task #1) which has an improved performance, but higher area usage - Optimized: re-used
picorv32_pcpi_mulmodule (Task #1)
- Reference:
picorv32_pcpi_divmodule in thepicorv32.vtop-level module - Optimized: Faster divide module generated by iteratively prompting ChatGPT
- Reference: re-used optimized
picorv32_pcpi_divmodule (Task #3) which has improved performance, but higher area usage - Optimized: re-used
picorv32_pcpi_divmodule (Task #3)
- Reference: re-used
picorv32_pcpi_divmodule (Task #3) - Optimized: Power-efficient divide module generated by iteratively prompting ChatGPT
Let's run through an example with Task #1.
First, we compile and simulate the Verilog code:
$ autoppa sim 1 baseline/reference/task1.v
> The simulation passed successfully
> Execution time (ns) == 3100
Next we synthesize the design to get area metrics:
$ autoppa synth baseline/reference/task1.v
> The synthesis completed successfully
> Area (number of cells) == 1550
After running both simulation and synthesis, you can extract power information:
$ autoppa power baseline/reference/task1.v
> The power analysis completed successfully
> Power (mW) == 3.61
You can run the previous steps with one command by using the benchmark command:
$ autoppa benchmark 1 reference
==========================================================================
Task number: 1
Description: Increase the speed (performance) of this multiply module
Metric: performance
Reference performance: 3100 ns
==========================================================================
-------- BASELINE reference --------
The simulation passed successfully
Execution time (ns) == 3100
The synthesis completed successfully
Area (number of cells) == 1516
The power analysis completed successfully
Power (mW) == 3.7100
The following baselines are considered for the benchmark:
- Reference design (trivial)
- Optimized design (mix between Pico configuration flags/ChatGPT iteratively prompted)
Yosys optimization command-line flags(not yet implemented)
If you run the 'agent' step, the AI will attempt to achieve one of the specified optimization tasks. For example:
autoppa agent 1
Another way to interact with the agent is to set up a Streamlit server. An additional benefit of this is that the evolution of the optimization task is clearer.
make serve