Skip to content

Latest commit

 

History

History
604 lines (460 loc) · 46.6 KB

File metadata and controls

604 lines (460 loc) · 46.6 KB

Wireguard FPGA Simulation Test Bench

Table of Contents

Introduction

The Wireguard FPGA test bench aims to have a flexible approach to simulation which allows a common test envoironment to be used whilst selecting between alternative CPU components, one of which uses the VProc virtual processor co-simulation element. This allows simulations to be fully HDL, with a RISC-V processor RTL implementation such as picoRV32, IBEX, or EDUBOS5, or to co-simulate software using the virtual processor, with a significant speed up in simulation times.

The VProc component is wrapped up into an soc_cpu.VPROC component with identical interfaces to the RTL. Some conversion logic is added to this BFM to convert between VProc's generic memory mapped interface and the soc_if defined interface. This is very lightweight logic, with less than ten combinatorial gates to match the control signals. In addition, the soc_cpu.VPROC component has a mem_model component instantiated. This is a 'memory port' to the mem_model C software implementation of a sparse memory model, allowing updates to the RISC-V program, if using the rv32 RISC-V ISS model (see below). The diagram below shows a block diagram of the test bench HDL.

Shown in the diagram is the Wireguard FPGA top level component (top) with the soc_cpu.VPROC component instantiated in it as one of three possible selected devices for the soc_cpu. The IMEM write port is connected to the UART for program updates and the soc_if from soc_cpu.VPROC is connected to the interconnect fabric (soc_fabric), just as for the two RTL components. The test bench around the top level Wireguard component has a driver for the UART (bfm_uart) and the four GMII/RGMII interfaces (including MDIO signals) coming from the Wireguard core to some verification IP (bfm_ethernet) to drive this signalling. This BFM implementation is based around the udpIpPg GMII/RGMII VProc based VIP. In addtion there are MDIO slave models for reading and writing over the PHY MDIO signals from Wireguard, and these also read and write to allocated VProc memory for access by VProc soc_cpu code. MDIO registers access are displayed to the console as well, for logging purpose, displaying the name of the clause 22 register accessed. Finally the test bench generates clocks and key press resets that go to the top level's clk_rst_gen and debounce components.

VerilatorSimCtrl (interactive run control)

What it does:

  • Interactive CLI inside the simulation (node 15) for run/for/until/finish and forces wave.fst flush so GTKWave sees fresh samples without restarting.

  • Opens GTKWave in a separate thread (see GTKWAVEOPTS and WAVESAVEFILE in MakefileVProc.mk).

  • Reload GTKWave to view new samples: File -> Reload Waveform (or Ctrl+Shift+R), then zoom/end to the latest time.

How to enable:

  • Pass DISABLE_SIM_CTRL=0, e.g.:
    make -f MakefileVProc.mk BUILD=ISS DISABLE_SIM_CTRL=0 rungui
  • DISABLE_SIM_CTRL=1 disables it (default). Instance is in tb.sv (node 15).

Key commands at the VerSimCtrl> prompt:

  • run for <N> <units> (e.g., run for 100 ns, run for 500 cycles)
  • run until <N> <units> (e.g., run until 5 us)
  • continue/c — same as run
  • finish/quit/exit — ends simulation ($finish)
  • Units: ps | ns | us | ms | s | cycle(s); default is cycles if omitted.
  • Each command flushes waves; in GTKWave hit reload + zoom-to-end to see the new range.

Auto-selection of soc_cpu Component

The Wireguard's top level component has the required RTL files listed in 1.hw/top.filelist. This includes files for the soc_cpu, under the directory ip.cpu. The simulation build make file (see below) will process the top.filelist file to generate a new local copy, having removed all references to the files under the ip.cpu directory. Since the VProc soc_cpu component is a test model, the soc_cpu.VPROC.sv HDL file is placed in 4.sim/models whilst the rest of the HDL files come from the VProc and mem_model repositories (auto-checked out by the make file, if necessary). These are referenced within the make file, along with the other test models that are used in the test bench. Thus the VProc device is selected for the simulation as the CPU component.

VProc Software

The VProc software consists of DPI-C code for communication and sycnronisation with the simulation, for both the memory model and VProc itself. On top of this are the APIs for VProc and mem_model for use by the running code. In the case of VProc there is a low level C API) or, if preferred, a C++ API. In Wireguard, the VProc soc_cpu is node 0, and so the entry point for user software is VUserMain0, in place of main.

The VProc software is compiled into libraries located in 4.sim/models/cosim/lib, with the headers in `4.sim/models/cosim/include (see here for more details). The C++ API is defined in a class VProc (defined in VProcClass.h), and a constructor creates an API object, defining the node for which it is connected:

VProc (const uint32_t node);

For the C++ VProc API there are two basic word access methods:

    int  write (const unsigned   addr, const unsigned    data, const int delta=0);
    int  read  (const unsigned   addr,       unsigned   *data, const int delta=0);

For these methods, the address argument is agnostic to being a byte address or a word address, but for the Wireguard implementation these are byte addresses. The delta argument is unused in Wireguard, and should be left at its default value, with just the address and data arguments used in the call to these methods. Along with these basic methods is a method to advance simulation time without doing a read or write transaction.

int  tick (const unsigned ticks);

This method's units of the ticks argument are in clock cycles, as per the clock that the VProc HDL is connected to. A basic VProc program, then, is shown below:

#include "VProcClass.h"
extern "C" {
#include "mem.h"
}

static const int node    = 0;

extern "C" void VUserMain0(void)
{
    // Create VProc access object for this node
    VProc* vp0 = new VProc(node);

    // Wait a bit
    vp0->tick(100);

    uint32_t addr  = 0x10001000;
    uint32_t wdata = 0x900dc0de;

    vp0->write(addr, wdata);
    VPrint("Written   0x%08x  to  addr 0x%08x\n", wdata, addr);

    vp0->tick(3);

    uint32_t rdata;
    vp0->read(addr, &rdata);

    if (rdata == wdata)
    {
        VPrint("Read back 0x%08x from addr 0x%08x\n", rdata, addr);
    }
    else
    {   VPrint("***ERROR: data mis-match at addr = 0x%08x. Got 0x%08x, expected 0x%08x\n", addr, rdata, wdata);
    }

    // Sleep forever
    while(true)
        vp0->tick(GO_TO_SLEEP);
}

The above code is a slightly abbreviated version of the code in 4.sim/usercode. Note that the VUserMain0 function must have C linkage as the VProc software that calls it is in C (as all the programming logic interfaces, including DPI-C, are C). The API also has a set of other methods for finer access control which are listed below, and more details can be found in the VProc manual.

    int  writeByte    (const unsigned   byteaddr, const unsigned    data, const int delta=0);
    int  writeHword   (const unsigned   byteaddr, const unsigned    data, const int delta=0);
    int  writeWord    (const unsigned   byteaddr, const unsigned    data, const int delta=0);
    int  readByte     (const unsigned   byteaddr,       unsigned   *data, const int delta=0);
    int  readHword    (const unsigned   byteaddr,       unsigned   *data, const int delta=0);
    int  readWord     (const unsigned   byteaddr,       unsigned   *data, const int delta=0);

The other methods is this class are not, at this point, used by Wireguard. These methods can now be used to write test code to drive the soc_if bus of the soc_cpu component, and is the basic method to write test code software. As well as the VProc API, the user software can have direct access to the sparse memory model API by including mem.h, which are a set of C methods (and mem.h must be included as extern "C" in C++ code). The functions relevant to Wireguard are shown below:

void     WriteRamByte  (const uint64_t addr, const uint32_t data, const uint32_t node);
void     WriteRamHWord (const uint64_t addr, const uint32_t data, const int little_endian, const uint32_t node);
void     WriteRamWord  (const uint64_t addr, const uint32_t data, const int little_endian, const uint32_t node);
uint32_t ReadRamByte   (const uint64_t addr, const uint32_t node);
uint32_t ReadRamHWord  (const uint64_t addr, const int little_endian, const uint32_t node);
uint32_t ReadRamWord   (const uint64_t addr, const int little_endian, const uint32_t node);

Note that, as C functions, there are no default parameters and the little_endian and node arguments must be passed in, even though they are constant. The little_endian argument is non-zero for little endian and zero for big endian. The node argument is not the same as for VProc, but allows multiple separate memory spaces to be modelled, just as for VProc multiple virtual processor instantiations. For Wireguard, this is always 0. All instantiated mem_model components in the HDL have (through the DPI) access to the same memory space model as the API, and so data can be exchanged from the simulation and the running code, such as the RISC-V programs over the IMEM write interface.

Compiling co-designed application code, either compiled for the native host machine, or to run on the rv32 RISC-V ISS will need further layers on top of these APIs, which will be virtualised away by that point (see the sections below). The diagram below summarises the software layers that make up a program running on the VProc HDL component. The "native test code" use case, shown at the top left, is for the case just described above that use the APIs directly.

Other Software Use Cases

Natively Compiled Application

As well as the native test code case seen in the previous section, the Wireguard application can be compiled natively for the host machine, including the hardware access layer (HAL), generated from SystemRDL. The HAL software output from this is processed to generate a version that makes accesses to the VProc and mem_model APIs in place of accesses with pointers to and from memory (see the Co-simulation HAL section below). The rest of the application software has these details hidden away in the HAL and sees the same API as presented by the auto-generated code. In both cases transactions happen on the soc_if bus port of the soc_cpu component. The main entry point is also swapped for VUserMain0.

RISC-V Compiled Application

To execute RISC-V compiled application code, the rv32 instruction set simulator is used as the code running on the virtual processor. The VUserMain0 program now becomes software to creates an ISS object and integrate with VProc. This uses the ISS's external memory access callback function to direct loads and stores either towards the sparse memory model, the VProc API for simulation transactions, or back to the ISS itself to handle. This ISS integration VUserMain0 program is located in 4.sim/models/rv32/usercode. When built the code here is compiled and uses the pre-built library in 4.sim/models/rv32/lib/librv32lnx.a containing the ISS, with the headers for it in 4.sim/models/rv32/include. More details of the integration code and methods can be found here.

The ISS supports interrupts, but these are not currently used on Wireguard. The integration software can read a configuration file, if present in the 4.sim/ directory, called vusermain.cfg. This allows the ISS and other features to be configured at run-time. The configuration file is in lieu of command line options and the entries in the file are formatted as if they were such, with a command matching the VUserMain program:

vusermain0 [options]

One of the options is -h for a help message, which is as shown below:

Usage:vusermain0 -t <test executable> [-hHebdrgxXRcI][-n <num instructions>]
      [-S <start addr>][-A <brk addr>][-D <debug o/p filename>][-p <port num>]
      [-l <line bytes>][-w <ways>][-s <sets>][-j <imem base addr>][-J <imem top addr>]
      [-P <cycles>][-x <base addr>][-X <top addr>][-V <core>]
   -t specify test executable/binary file (default test.exe)
   -B specify to load a raw binary file (default load ELF executable)
   -L specify address to load binary, if -B specified (default 0x00000000)
   -n specify number of instructions to run (default 0, i.e. run until unimp)
   -d Enable disassemble mode (default off)
   -r Enable run-time disassemble mode (default off. Overridden by -d)
   -C Use cycle count for internal mtime timer (default real-time)
   -a display ABI register names when disassembling (default x names)
   -T Use external memory mapped timer model (default internal)
   -H Halt on unimplemented instructions (default trap)
   -e Halt on ecall instruction (default trap)
   -E Halt on ebreak instruction (default trap)
   -b Halt at a specific address (default off)
   -A Specify halt address if -b active (default 0x00000040)
   -D Specify file for debug output (default stdout)
   -R Dump x0 to x31 on exit (default no dump)
   -c Dump CSR registers on exit (default no dump)
   -g Enable remote gdb mode (default disabled)
   -p Specify remote GDB port number (default 49152)
   -S Specify start address (default 0)
   -I Enable instruction cache timing model (default disabled)
   -l Specify number of bytes in icache line (default 8)
   -w Specify number of ways in icache (default 2)
   -s Specify number of sets in icache (default 256)
   -j Specify cached IMEM base address (default 0x00000000)
   -J Specify cached IMEM top address (default 0x7fffffff)
   -P Specify penalty, in cycles, of one slow mem access (default 4)
   -x Specify base address of external access region (default 0xFFFFFFFF)
   -X Specify top address of external access region (default 0xFFFFFFFF)
   -V Specify RISC-V core timing model to use (default "DEFAULT")
   -h display this help message

With these options the model can load an elf executable to memory directly and be set up with some execution termination conditions. Disassembly output can also be switched on and registers dumped on exit. More details of all these features can be found in the rv32 ISS manual.

Specific to the Wireguard project is the ability to specify the region where memory loads and stores will make external simulation transactions rather than use internal memory modelling or peripherals, using the -x and -X options. This is useful to allow access to the CSR registers in the HDL whilst mapping all of the memory internal using the sparse C memory model of mem_model. The cache model can be enabled with the -I option and the cache configured. The -l option specifies the number of bytes in a cache line, which can be 4, 8 or 16. The number of ways is set with -w and can be either 1 or 2, and the number of sets is specified with the -s options and can be 128, 256, 512 or 1024. The Wireguard project also has the option to load a raw binary file to memory in place of reading an ELF file. The -B selects this mode (with the -t still specifying the file name), and the load address can be changed from 0 with the -L option. A set of pre-configured timing models can be specified with the -V option. The argument must be one of the following:

  • DEFAULT
  • PICORV32
  • EDUBOS5STG2
  • EDUBOS5STG3
  • IBEXMULSGL
  • IBEXMULFAST
  • IBEXMULSLOW

This reflects the available models as detailed in the Configuring ISS timing model section below.

Building and Running Code

A MakefileVProc.mk file is provided in the 4.sim/ directory to compile the user VProc software, for both the soc_cpu and udpIpPg components, and to build and run the test bench HDL. The make file will compile all the user code or, where an ISS build is selected (see make file variables below) the provided soc_cpu user code that's the rv32 ISS integration software. By default, the make file will compile the VUserMain0.cpp user code for soc_cpu and VUserMainUdp.cpp for udpIpPg located in 4.sim/usercode, but the directory and list of files to compile can be specified on the command line (see below). The VUserMainUdp.cpp file contains the VUserMain<n> entry points for all four instantiated udpIpPg modules (nodes 1 to 4). To alter which files to compile, the make file USER_C variable can be updated to list a set of C++ files for the soc_cpu. Similarly, the UDP_C variable can be updated with a list of files for the Ethernet components. The location of the source files is in the variable USRCODEDIR, which may also be altered. Any modifications can be done to the make file itself, or on the command line. E.g., to add additional files to the soc_cpu build:

make -f MakefileVProc.mk USER_C="VUserMain0.cpp MyTest1Class.cpp"

If many variants of software build are required then either scripts can be constructed with the various command line variable modification calls to make or other make files which set these varaiable and call the common make file. This is useful in managing source code for multiple tests located in different directories, compiling for ISS (perhaps also calling the RISC-V application build), or for compiling application code natively which will have a different set of source files.

The user software is compiled into a local static library, libuser.a which is linked to the simulation code within Verilator along with the precompiled libcosimlnx.a (or libcosimwin.a for MSYS2/mingw64 on Windows) located in 4.sim/models/cosim/lib and containing the precompiled code for VProc and mem_model. The headers for the VProc and mem_model API software are in 4.sim/models/cosim/include. The HDL required for these models' use in the Wireguard trest bench can be found in 4.sim/models/cosim, and the make file picks these up from there to compile with the rest of the test bench HDL.

The MakefileVProc.mk make file has a target help, which produces the following output:

make -f MakefileVProc.mk help          Display this message
make -f MakefileVProc.mk               Build C/C++ and HDL code without running simulation
make -f MakefileVProc.mk run           Build and run batch simulation
make -f MakefileVProc.mk rungui/gui    Build and run GUI simulation
make -f MakefileVProc.mk clean         clean previous build artefacts

Command line configurable variables:
  USER_C:       list of user source code files (default VUserMain0.cpp)
  UDP_C:        list of user source code files (default VUserMainUdp.cpp)
  USRCODEDIR:   directory containing user source code (default $(CURDIR)/usercode)
  OPTFLAG:      Optimisation flag for user VProc code (default -g)
  TIMINGOPT:    Verilator timing flags (default --timing)
  TRACEOPTS:    Verilator trace flags (default --trace-fst --trace-structs)
  TOPFILELIST:  RTL file list name (default top.filelist)
  SOCCPUMATCH:  string to match for soc_cpu filtering in h/w file list (default ip.cpu)
  USRSIMOPTS:   additional Verilator flags, such as setting generics (default blank)
  WAVESAVEFILE: name of .gtkw file to use when displaying waveforms (default waves.gtkw)
  BUILD:        Select build type from DEFAULT or ISS (default DEFAULT)
  TIMEOUTUS:    Test bench timeout period in microseconds (default 15000)

By default, without a named target, the simulation executable will be built but not run. With a run target, the simulation executable is built and then executed in batch mode. To fire up waveforms after the run, a target of rungui or gui can be used. A target of clean removes all intermediate files of previous compilations.

The make file has a set of variables (with default settings) that can be overridden on running make. E.g. make VAR=NewVal. The help output shows these variables with brief decriptions. Entries with multiple values should be enclosed in double quotes. By default native test code is built, but if BUILD is set to ISS, then the rv32 ISS and VProc program is compiled and, in this case, the USER_C and USRCODEDIR are ignored as the make file compiles the supplied source code for the ISS.

The USER_C and USERCODEDIR make file variable allows different (and multiple) user source file names to override the defaults, and to change the location of where the user code is located (if not the ISS build). This allows different programs to be run by simply changing these variable, and to organise the different source code in different directories etc. By default, the VProc code is compiled for debugging (-g), but this can be overridden by changing OPTFLAG. The trace and timing options can also be overridden to allow a faster executable. The Wireguard top.filelist filename can be overridden to allow multiple configurations to be selected from, if required. The processing of this file to remove the listed soc_cpu HDL files is selected on a pattern (ip.cpu) but this can be changed using SOCCPUMATCH. If any additional options for Verilator are required, then these can be added to USRSIMOPTS. The GTKWave waveform file can be selected with WAVESAVEFILE.

Control of when the simulation exits can be specified with the TIMEOUTUS variable in units of microseconds. Some example commands using the make file are shown below:

make -f MakefileVProc.mk run                                                   # Build and run default VUserMain0.cpp code in usercode/
make -f MakefileVProc.mk                                                       # Build but don't run default code
make -f MakefileVProc.mk USER_C="test1.cpp subfuncs.cpp" USRCODEDIR=test1 run  # Build and run test1.cpp and subfuncs.cpp in test1/
make -f MakefileVProc.mk BUILD=ISS gui                                         # Build and run ISS simulation and show waves
make -f MakefileVProc.mk clean                                                 # Clean all intermediate files

Configuring ISS timing model

Configuration of the timing model can done from the supplied integration code in VUserMain0.cpp. The main pre_run_setup() function, in VUserMain0.cpp, creates an rv32_timing_config object (rv32_time_cfg) which has an update_timing method that takes a pointer to the iss object and an enumerated type to select the model to use for the particular core timings required. This second argument is selected from one of the following:

  • rv32_timing_config::risc_v_core_e::DEFAULT      : Default timing values
  • rv32_timing_config::risc_v_core_e::PICORV32     : picoRV32 timings
  • rv32_timing_config::risc_v_core_e::EDUBOS5STG2  : 2 stage eduBOS5
  • rv32_timing_config::risc_v_core_e::EDUBOS5STG3  : 3 stage eduBOS5
  • rv32_timing_config::risc_v_core_e::IBEXMULSGL   : IBEX single cycle multipler
  • rv32_timing_config::risc_v_core_e::IBEXMULFAST  : IBEX fast multi-cycle multiplier
  • rv32_timing_config::risc_v_core_e::IBEXMULSLOW  : IBEX slow multi-cycle multiplier

As detailed in the RISC-V Compiled Application section above, the ISS can be configured via the vusermain.cfg file using the -V option.

Running ISS code

When the test bench is built for the rv32 ISS, the actual 'user' application code is run on the RISC-V ISS model itself, and is compiled using the normal RISC-V GNU toochain to produce a binary file that the ISS can load and run. As described above, the code that is run is slected with the vusermain.cfg file and the -t option. The various flags configure the ISS and determines when the ISS is halted (if at all). An example assembly file is provided in 4.sw/models/rv32/riscvtest/main.s (as well as a recompiled main.bin). This assembly code reproduces the functionality of the example VuserMain0.cpp program discussed previously, writing to memory, reading back and comparing for a mismatch. The example assembly code is compiled with:

$riscv64-unknown-elf-as.exe -fpic -march=rv32imafdc -aghlms=main.list -o main.o main.s
$riscv64-unknown-elf-ld.exe main.o -Ttext 0 -Tdata 1000 -melf32lriscv -o main.bin

In this instance, the code is set to compile to use the MAFDC extensions (maths, atomic, float, double and compressed). To run this code the vusermain.cfg is set to:

vusermain0 -x 0x10000000 -X 0x20000000 -rEHRca -t ./models/rv32/riscvtest/main.bin

This sets the address region that will be sent to the HDL soc_cpu bus to be between byte addresses 0x10000000 and 0x1FFFFFFF. All other accesses will use the direct memory model's API, with no simulation transactions. The next set of options turn on run-time disassembly (-r), exit on ebreak (-E) or unimplemented instruction (-H), dump registers (-R) and CSR register (-c) and display the registers in ABI format (-a). The pre-compiled example program binary is then selected with the -t option. Of course, many of these options are not necessary and, for example, the output flags (-rRca) can be removed and the program will still run correctly. In the 4.sim/ directory, using make to build and run the code gives the following output:

$make -f MakefileVProc.mk BUILD=ISS run
- V e r i l a t i o n   R e p o r t: Verilator 5.024 2024-04-05 rev v5.024-42-gc561fe8ba
- Verilator: Built from 2.145 MB sources in 40 modules, into 0.556 MB in 20 C++ files needing 0.001 MB
- Verilator: Walltime 0.298 s (elab=0.020, cvt=0.087, bld=0.000); cpu 0.000 s on 1 threads; alloced 14.059 MB
Archive ar -rcs Vtb__ALL.a Vtb__ALL.o
VInit(0): initialising DPI-C interface
  VProc version 1.11.4. Copyright (c) 2004-2024 Simon Southwell.
                   0 TOP.tb.error_mon (0) - ERROR_CLEARED

  ******************************
  *   Wyvern Semiconductors    *
  * rv32 RISC-V ISS (on VProc) *
  *     Copyright (c) 2024     *
  ******************************

00000000: 0x00001197    auipc     gp, 0x00000001
00000004: 0x0101a183    lw        gp, 16(gp)
00000008: 0x0001a103    lw        sp, 0(gp)
0000000c: 0x10001237    lui       tp, 0x00010001
00000010: 0x00222023    sw        sp, 0(tp)
00000014: 0x00022283    lw        t0, 0(tp)
00000018: 0x00229663    bne       t0, sp, 12
0000001c: 0x00004505'   addi      a0, zero, 1
0000001e: 0x00004501'   addi      a0, zero, 0
00000020: 0x05d00893    addi      a7, zero, 93
00000024: 0x00009002'   ebreak
    *

Register state:

  zero = 0x00000000   ra = 0x00000000   sp = 0x900dc0de   gp = 0x00001000
    tp = 0x10001000   t0 = 0x900dc0de   t1 = 0x00000000   t2 = 0x00000000
    s0 = 0x00000000   s1 = 0x00000000   a0 = 0x00000000   a1 = 0x00000000
    a2 = 0x00000000   a3 = 0x00000000   a4 = 0x00000000   a5 = 0x00000000
    a6 = 0x00000000   a7 = 0x0000005d   s2 = 0x00000000   s3 = 0x00000000
    s4 = 0x00000000   s5 = 0x00000000   s6 = 0x00000000   s7 = 0x00000000
    s8 = 0x00000000   s9 = 0x00000000  s10 = 0x00000000  s11 = 0x00000000
    t3 = 0x00000000   t4 = 0x00000000   t5 = 0x00000000   t6 = 0x00000000

CSR state:

  mstatus    = 0x00003800
  mie        = 0x00000000
  mvtec      = 0x00000000
  mscratch   = 0x00000000
  mepc       = 0x00000000
  mcause     = 0x00000000
  mtval      = 0x00000000
  mip        = 0x00000000
  mcycle     = 0x0000000000000037
  minstret   = 0x000000000000000b
  mtime      = 0x0006263f2bfc6bcf
  mtimecmp   = 0xffffffffffffffff
Exited running ./models/rv32/riscvtest/main.bin
- /mnt/hgfs/winhome/simon/git/Wireguard-fpga/4.sim/tb.sv:44: Verilog $finish

Note that the disassembled output is a mixture of 32-bit and compressed 16-bit instructions, with the compressed instruction hexadecimal values shown followed by a ' character and the instruction heximadecimal value in the lower 16-bits. Unlike for the native compiled code use cases, unless the HDL has changed, the test bench does not need to be re-built when the RISC-V source code is changed or a different binary is to be run, just the RISC-V code is re-compiled or the vusermain.cfg updated to point to a different binary file.

PicoRV32 RTL-Only Simulation Makefile

A standalone Makefile (located in 4.sim/) for cycle-accurate Verilator simulation using the real picoRV32 RTL core. It:

  • Drives the UDP/IPv4 BFMs (nodes 1–4) via usercode/VUserMainUdp.cpp, using VProc’s DPI-C engine for the Ethernet VIP.
  • Generates C++ sources with:
    • --cc -sv --timing --trace-fst --trace-structs
    • +define+SIM_ONLY and +define+VPROC_SV
    • File lists top.filelist & simple_tb.filelist to pull in picoRV32 RTL.
  • Compiles into output/ and links against:
    • libcosimlnx.a (co-simulation)
    • libudplnx.a (UDP/IP VIP)
    • DPI headers in models/cosim/include and models/udpIpPg/include.
  • Provides standard targets:
    • compile → generate & build
    • sim → run ./output/Vtb (logs to sim.log)
    • wave → open wave.fst in GTKWave
    • clean → remove output/, tb.xml, tb.stems

Debugging Code

In each of the three usage cases of software, each can be debugged using gdb, either for the host computer or the gnu RISC-V toolchain's gdb.

Natively Compiled code

For natively compiled code, whether test code or natively compiled application code, so long as each was compiled with the -g flag set (see above for make file options) then the Verilator compiled simulation is an executable file (compiled into an output/ directory) that contains the all the compiled user code. Therefore, to debug using gdb, this executable just needs to be run with the host computer's gdb. E.g., from the 4.sim/ directory:

gdb output/Vtb

Debugging then proceeds just as for any other executable.

ISS Software

The ISS has a remote gdb interface (enable with the -g option in the vusermain.cfg file) allowing the loading of programs via this connection, and of doing all the normal debugging steps of the RISC-V code. The ISS manual details how to use the gdb remote debug interface but, to summarise, when the ISS is run in GDB mode, it will create a TCP socket and advertise the port number to the screen (e.g. RV32GDB: Using TCP port number: 49152). The RISC-V gdb is then run and a remote connection is made with a command:

(gdb) target remote :49152

A blank before the colon character in the port number indicates the connection is on the local host, but a remote host name can be used to do remote debugging from another machine on the network, or even over the internet, if sufficient access permissions. The program (if not done so by other means) can be loaded over this connection and then debugging commence as normal.

The ISS manual has more details on this and also has an appendix showing how to setup an Eclipse IDE project to debug the code via gdb.

The mem_model Co-Simulation Sparse Memory Model

The Wireguard FPGA test bench makes use of the mem_model co-simulation component. This consists of a sparse memory model, written in C with a software API for read and write transactions. It can map a 64-bit address space, with pages allocated on demand to restrict the actual memory required. The API can be accessed from any VProc running code to share this memory space. This model can also be accessed from the HDL using the mem_model HDL component, which may be instantiated any number of times, but always accesses the same memory. This allows multiple VProc virtual processors and the simulated test bench logic to access a common memory space.

Currently, the soc_cpu.VPROC component has a mem_model instantiated for program writes via a UART, and the software running on the VProc virtual processor can access the memory directly via the API. The software running on the VProc used in the udpIpPg modules in the bfm_ethernet driver also has access to the same API and memory space, and the bfm_phy_mdio uses instantiated mem_model modules to access this space as well. Thus, the memory model becomes the common connection within the test bench, allowing end-to-end access and verification of results.

Details of the memory model can be found in the README.md in 4.sim/models/cosim.

Driving the Wireguard Logic Ethernet Ports

The Wireguard FPGA logic has interfaces for four Ethernet ports, transferring UDP/Ipv4 packets over GMII for 1GbE. Accompanying each Ethernet port is also an MDIO interface used for configuring the PHY with IEEE802.3 Clasue 22 registers. In order to drive these interfaces, the test bench has a bfm_ethernet module based on the udpIpPg VIP to generate packets over GMII, along with bfm_phy_mdio blocks to respond to the MDIO transactions and which map the registers to address ranges within in the mem_model's memory space via instantiated mem_model HDL components.

More details on the ethernet driver and udpIpPg can be found in the README.md file in 4.sim/models/udPgIp.

udpIpPg Software

Since the udpIpPg bus functional model is based around a VProc component, just as is the soc_cpu.VPROC, the basic structure usage the same, but with a different API suitable for UDP datagram generation and payload reception. For each of the four instantiated udpIpPg blocks in the bfm_ethernet module, there is a VUserMain<n> entry point, where <n> ranges from 1 to 4 for these models. The basic API for the user code is fairly simple and is as follows:

class udpIpPg  : public udpVProc
{
public:
    // Constructor
    udpVProc(int nodeIn);

    // Function to register user callback function to receive packets
    void           registerUsrRxCbFunc (pUsrRxCbFunc_t pFunc, void* hdlIn);

    // Method to generate a UDP/IPv4 packet
    uint32_t       genUdpIpPkt         (udpConfig_t &cfg, uint32_t* frm_buf, uint32_t* payload, uint32_t payload_len);

    // Method to send a pre-prepared (raw) ethernet frame
    uint32_t       UdpVpSendRawEthFrame(uint32_t* frame, uint32_t len);

    // Method to idle for specified number of cycles
    uint32_t       UdpVpSendIdle(uint32_t ticks);

    // Method to set the halt output signal
    void           UdpVpSetHalt(uint32_t val);

}

Construction of the API object is just a matter of calling the constructor with the node numbeor associated with the VProc for this instance. This shoul mathc the VUserMain<n> "main" enntry point. The registerUsrRxCbFunc method allows a user supplied callback function to be registered that will be called for each received packet. As well as the pointer to the callback function, and optional hdl handle void pointer can be specified which will be passed to the callback function when called. This can be used, for example, as a pointer to a buffer or queue in which to place received data. The callback function itelef has two parameters. The first is of type rxInfo_t, which is a structure containing certain information about the received packet.

    // Structure for received packet information
typedef struct {
    uint64_t mac_src_addr;
    uint32_t ipv4_src_addr;
    uint32_t udp_src_port;
    uint32_t udp_dst_port;
    uint8_t  rx_payload[ETH_MTU];
    uint32_t rx_len;
} rxInfo_t;

The callback strcuture parameter has information about the source MAC and PIv4 addresses, as well as the source and destination ports. The payload, if any, will be in the rx_payload[] buffer, with the length of this given in rx_len.

Generation and transmitting of packets is done in two stages. An ethernet packet, encoded with IPv4 and UDP, is constructed into a frame buffer using the genUdpIpPkt method. The first parameter is a simple configuration structure (see below), followed by a frame buffer pointer with sufficent space for the packet, which won't be more than 2Kbytes.

// Structure definition for transmit parameters
typedef class {
public:
    // UDP controls
    uint32_t dst_port;

    // IPV4 parameters
    uint32_t ip_dst_addr ;

    // MAC parameters
    uint64_t mac_dst_addr;
} udpConfig_t;

A pointer to a buffer with a payload followed by the length of the payload make up the last two parameters. Note that the type of the buffers are uint32_t, but each entry represents a byte or symbol since the encoding procesess internally can expand the byte data to more than 8 bits for cetain protocols, and consistency between all buffers in the encoding is maintained. The method returns the total length of the packet upon return.

Once the packet has been constructed it is sent over the GMII interface in the logic simulation, via VProc, with a call to the UdpVpSendRawEthFrame, which is provided with the frame buffer pointer and the length returned by genUdpIpPkt. When no packet is to be transmitted, the interface must send idle symbols, and the UdpVpSendIdle method does just this, specifying the number of clock cycles. This is comparable to the tick method for the soc_cpu.VPROC software.

System-Level Ethernet Simulation

With the addition of the UdpIpPg Virtual Processor module and accompanying C++ driver, it is now possible to perform full end-to-end Ethernet packet tests directly from user code:

  • Frame generation A C++ application (in 4.sim/usercode/VUserMainUdp.cpp) uses the udpIpPg class to

    1. configure destination MAC/IP/UDP port,
    2. build a complete Ethernet + IPv4 + UDP frame via genUdpIpPkt(), and
    3. transmit it on GMII using UdpVpSendRawEthFrame(). Idle symbols are driven between frames with UdpVpSendIdle() to keep the link alive.
  • Frame reception The same udpIpPg API supports a user callback registered by

    pUdp.registerUsrRxCbFunc(rxCallback, nullptr);
  • This mechanism now makes system-level testing of the Ethernet data path trivial—no manual RTL testbench tweaks are required. You can script packet streams, verify traffic, and log every received packet directly in your C++ test harness.

  • Example Console Log:

MAC src      = D89EF3887EC3
IPv4 src     = C0A81908
UDP src port = 0400
UDP dst port = 0401
Payload (64 bytes):
00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C ......

PCAP replay/record quickstart (VUserMainPcap)

  • Generate a test PCAP (writes into tools/): python tools/gen_udp_pcap.py --frames 5 --interval-us 500 --out ./tools/test_udp_rand.pcap
  • Build fresh and run the Ethernet replay/record simulation (uses PCAP_IN_1 default ./tools/test_udp_rand.pcap): make -f MakefileVProc.mk clean make -f MakefileVProc.mk UDP_C=VUserMainPcap.cpp BUILD=ISS run
    • Outputs of interest in ./output/:
      • node2_out.pcap, node4_out.pcap – RX captures with corrected timestamps
      • merge_node2.pcap, merge_node4.pcap – TX+RX merged on a single timeline
    • In Wireshark: set Time Display to “Seconds Since Epoch” or “Date and Time of Day”. Start-to-start latency (TX→RX) should match what you measure in wave.fst via GTKWave.

Co-simulation HAL

Using the HAL

The HAL provides a hierarchical access to the registers via a set of pointer dereferencing and a final access method (for reads and writes of registers and their bit fields) that reflects the hierarchy of the RDL specification. The following shows some example accesses, based on the 3.build/csr_build/csr.rdl (as at its first revision on 10/11/2024):

    #include "wireguard_regs.h"

    // Create a CSR register object. A base address can be specified
    // but defaults to the address specified in the RDL
    csr_vp_t* csr = new csr_vp_t();

    // Write to address field and read back.
    csr->ip_lookup_engine->table[0]->allowed_ip[0]->address(0x12345678);
    printf("address = 0x%08lx\n\n", csr->ip_lookup_engine->table[0]->allowed_ip[0]->address());

    // Write to whole endpoint register
    csr->ip_lookup_engine->table[3]->endpoint->full(0x5555555555555555ULL);

    // Write to bit field in endpoint register
    csr->ip_lookup_engine->table[3]->endpoint->interface(0x7);

    // Read back bit field in endpoint register
    printf("interface = 0x%1lx\n\n", csr->ip_lookup_engine->table[3]->endpoint->interface());

The above code will compile either natively for VProc or for the RISC-V hardware, with the appropriate header, as decribed above. Write accesses use a method with the final register bit field name with an appropriate argument (this is either a uint64_t or uint32_t as appropriate to the register's definition). A read access is done in the same manner bit without an argument and returns a value (either a uint64_t or uint32_t as appropriate).

A convention has been used where to access a whole register the 'bit field' access method is named full, with bit field accesses using their declared names, as normal. Some assumptions have been made with the script as it stands based on the current csr.rdl (but new features can be added). The main one currently is that arrays can't be multi-dimensional (hierarchy can be used to achieve the same thing) and an error is thrown if detected.

Other Co-simulation considerations

The HAL software abstracts away the details of hardware and co-simulation register accesses but a couple of other consideration are needed to allow code to compile both for hardware and simulation. The first of these is the main entry point.

A normal application compiled for the target has a main() entry point function. In VProc co-simulation, this is not the case as the logic simulation itself has a main() function already defined and there can be multiple VProc node instantiations, each with their own entry point. These are named VUserMain<n>, where <n> is the node number. So, node 0 has an entry point function VUserMain0. The auto-generated HAL co-simulation headers include a WGMAIN definition that is either main for the hardware code or VUserMain0 for VProc code (assuming node 0 for soc_cpu). This is then used in place of main at the top level application code.

#include "wireguard_regs.h"

// Application top level
void WGMAIN (void)
{
  // Top level source code here
}

The second consideration is the use of delay functions. This can be in the form of standard C functions, such as usleep, or application specific functions using instruction loops. In either case, these should be wrapped in a commonly named function—e.g., wg_usleep(int time). The wrapper delay library function will then need to have VPROC selected code to either call the application specific target delay function, or to convert the specified time to clock cycles and call the VProc API function VTick (or its C++ API equivalent) to advance simulation time the appropriate amount. The co-simulation auto-generated HAL header has SOC_CPU_CLK_PERIOD_PS defined that can be configured on the 3.build/sysrdl_cosim.py command line with -C or --clk_period, but defaults to the equivalent of 80MHz that the test bench uses for the soc_cpu. A SOC_CPU_VPNODE is also defined, defaulting to 0, for use when calling the VProc C API functions directly. The definition is affected by the -v or --vp_node command line options of 3.build/sysrdl_cosim.py.

Tool Versions

  • Verilator v5.024
  • VProc v1.12.2
  • Mem Model v1.0.0
  • rv32 ISS v1.1.4
  • udpIpPg v1.0.3

Finally, the

References: