Skip to content

Latest commit

 

History

History
168 lines (116 loc) · 7.69 KB

File metadata and controls

168 lines (116 loc) · 7.69 KB

Getting Started with Jacquard

Caveats: Jacquard currently only supports non-interactive testbenches. This means the input to the circuit needs to be a static waveform (e.g., VCD). Registers and clock gates inside the circuit are allowed, but latches and other asynchronous sequential logics are currently unsupported.

Dataset: Some (namely, netlists after AIG transformation in Steps 1-2 below, and reference VCDs) input data is available here .

Step 0. Download the AIG Process Kit

Go to aigpdk directory where you can download aigpdk.lib, aigpdk_nomem.lib, aigpdk.v, and memlib_yosys.txt. You will need them later in the flow.

Before continuing, make sure your design contains only synchronous logic. If your design has clock gates implemented in your RTL code, you need to replace them manually with instantiations to the CKLNQD module in aigpdk.v. Also, you are advised to be familiar with where memory blocks (e.g., caches) are implemented in your design so you can check that the memory blocks are mapped correctly later.

Step 1. Memory Synthesis with Yosys

This step makes use of the open-source Yosys synthesizer to recognize and map the memory blocks automatically.

Download and compile the latest version of Yosys. Then run yosys shell with the following synthesis script.

# replace this with paths to your RTL code, and add `-I`, `-D`, `-sv` etc when necessary
read_verilog xx.v yy.v top.v

# replace TOP_MODULE with your top module name
hierarchy -check -top TOP_MODULE

# simplify design before mapping
proc;;
opt_expr; opt_dff; opt_clean
memory -nomap

# map the rams
# point -lib path to your downloaded memlib_yosys.txt
memory_libmap -lib path/to/memlib_yosys.txt -logic-cost-rom 100 -logic-cost-ram 100

The memory_libmap command will output a list of RAMs it found and mapped.

  • If you see $__RAMGEM_SYNC_ (naming inherited from GEM), it means the mapping is successful.
  • If you see $__RAMGEM_ASYNC_, it means this RAM is found to have asynchronous READ port. You need to confirm if it is the case.
    • If it is a synchronous one but accidentally recognized as asynchronous, you might need to patch the RTL code to fix it. There might be multiple reasons it cannot be recognized as synchronous. For example, when the read and write clocks are different.
    • If it is indeed asynchronous, check its size. If its size is very small and affordable to be synthesized using registers and mux trees (which is very expensive for large RAM banks), you can remove the $__RAMGEM_ASYNC_ block in memlib_yosys.txt, re-run Yosys to force the use of registers.
  • If you see using FF mapping for memory, it means the memory is recognized, but due to it being nonstandard (e.g., special global reset or nontrivial initialization), Jacquard will fall back to registers and mux trees. If the size of the memory is small, this is usually not an issue. Otherwise, you are advised to try other implementations.

After a successful mapping, use the following command to write out the mapped RTL as a single Verilog file.

write_verilog memory_mapped.v

Check the correctness of this step by simulating memory_mapped.v with your reference CPU simulator.

Step 2. Logic Synthesis

This step maps all combinational and sequential logic into a special set of standard cells we defined in aigpdk.lib. The quality of synthesis is directly tied to Jacquard's final performance, so we suggest you use a commercial synthesis tool like DC. You can also use Yosys to complete this if you do not have access to a commercial synthesis tool.

Check the correctness of this step by simulating gatelevel.gv with your reference CPU simulator.

Use Synopsys DC

First, you need to compile aigpdk.lib to aigpdk.db using Library Compiler.

With that, you synthesize the memory_mapped.v obtained before under aigpdk.db.

Some key commands you may use on top of your existing DC flow:

# change path/to/aigpdk.db to a correct path. same for other commands.
set_app_var link_path path/to/aigpdk.db
set_app_var target_library path/to/aigpdk.db
read_file -format db $target_library

# elaborate TOP_MODULE
# current_design TOP_MODULE

# timing settings like create_clock ... are recommended. Jacquard benefits from timing-driven synthesis.

compile_ultra -no_seq_output_inversion -no_autoungroup
optimize_netlist -area

write -format verilog -hierarchy -out gatelevel.gv

Use Yosys: Example script

# if you exited Yosys in step 2, you can read back in your memory_mapped.v yourself.
# read_verilog memory_mapped.v
# hierarchy -check -top TOP_MODULE

# synthesis
synth -flatten
delete t:$print

# change path/to/aigpdk_nomem.lib to a correct path. same for other commands.
dfflibmap -liberty path/to/aigpdk_nomem.lib
opt_clean -purge
abc -liberty path/to/aigpdk_nomem.lib
opt_clean -purge
techmap
abc -liberty path/to/aigpdk_nomem.lib
opt_clean -purge

# write out
write_verilog gatelevel.gv

Step 3. Download and Compile Jacquard

Download and install the Rust toolchain. This is as simple as a one-liner in your terminal. We recommend https://rustup.rs.

Clone Jacquard along with its dependencies.

git clone https://github.com/ChipFlow/Jacquard.git
cd Jacquard
git submodule update --init --recursive

Jacquard supports two GPU backends: CUDA (NVIDIA GPUs on Linux) and Metal (Apple Silicon Macs).

All functionality is accessed through the jacquard CLI, which provides map, sim, and cosim subcommands:

# Mapping (no GPU features needed)
cargo run -r --bin jacquard -- map --help

# Simulation (Metal - macOS)
cargo run -r --features metal --bin jacquard -- sim --help

# Simulation (CUDA - Linux, requires CUDA toolkit)
cargo run -r --features cuda --bin jacquard -- sim --help

Simulate the Design

Jacquard automatically partitions the design at startup using mt-kahypar-sc hypergraph partitioning.

If partitioning fails due to deep circuits (which often shows as trying to partition a circuit with only 0 or 1 endpoints), try adding a --level-split option to force a stage split. For example --level-split 30 or --level-split 20,40.

Metal (macOS)

Use NUM_BLOCKS=1 for Metal.

cargo run -r --features metal --bin jacquard -- sim path/to/gatelevel.gv path/to/input.vcd path/to/output.vcd 1

CUDA (Linux)

Replace NUM_BLOCKS with twice the number of physical streaming multiprocessors (SMs) of your GPU.

cargo run -r --features cuda --bin jacquard -- sim path/to/gatelevel.gv path/to/input.vcd path/to/output.vcd NUM_BLOCKS

VCD Scope Handling

Jacquard automatically detects the correct VCD scope containing your design's ports. In most cases, you don't need to specify --input-vcd-scope. If auto-detection fails or you need to override it, use:

# Metal
cargo run -r --features metal --bin jacquard -- sim path/to/gatelevel.gv path/to/input.vcd path/to/output.vcd 1 --input-vcd-scope "testbench/dut"

# CUDA
cargo run -r --features cuda --bin jacquard -- sim path/to/gatelevel.gv path/to/input.vcd path/to/output.vcd NUM_BLOCKS --input-vcd-scope "testbench/dut"

Use slash separators (/) for hierarchical paths, not dots. See troubleshooting-vcd.md for details.

The simulated output ports value will be stored in output.vcd.

Caveat: The actual GPU simulation runtime will also be outputted. You might see a long time before GPU enters due to reading and parsing input.vcd. You are recommended to develop your own pipeline to feed the input waveform into Jacquard's GPU kernels.