Releases: pulp-platform/ara
Releases · pulp-platform/ara
v2.2.0
Fixed
- Fix typo on the build instructions of the README
- Fix Gnuplot installation on GitHub's CI
- The number of elements requested by the Store Unit and the Element Requester now depends both on the requested eew and the past
eewof the vector of the used register - When the VRF is written and
EMUL > 1, theeewof all the interested registers is updated - Memory operations can change EMUL when EEW != VSEW
- The LSU now correctly handles bursts with a saturated length of 256 beats
- AXI transactions on an opposite channel w.r.t. the channel currently in use are started only after the completion of the previous transactions
- Fix the number of elements to be requested for a
vslidedowninstruction
Added
benchmarksapp to benchmark Ara- CI task to create roofline plots of
imatmulandfmatmul, available as artifacts - Vector floating-point compare instructions (
vmfeq,vmfne,vmflt,vmfle,vmfgt,vmfge) - Vector single-width floating-point/integer type-convert instructions (
vfcvt.xu.f,vfcvt.x.f,vfcvt.rtz.xu.f,vfcvt.rtz.x.f,vfcvt.f.xu,vfcvt.f.x) - Vector widening floating-point/integer type-convert instructions (
vfwcvt.xu.f,vfwcvt.x.f,vfwcvt.rtz.xu.f,vfwcvt.rtz.x.f,vfwcvt.f.xu,vfwcvt.f.x,vfwcvt.f.f) - Vector narrowing floating-point/integer type-convert instructions (
vfncvt.xu.f,vfncvt.x.f,vfncvt.rtz.xu.f,vfncvt.rtz.x.f,vfncvt.f.xu,vfncvt.f.x,vfncvt.f.f) - Vector whole-register move instruction
vmv<nr> - Vector whole-register load/store
vl1r,vs1r - Vector load/store mask
vle1,vse1 - Whole-register instructions are executed also if
vtype.vl == 0 - Makefile option (
trace=1) to generate waveform traces when running simulations with Verilator
Changed
- Add spill register at the lane edge, to cut the timing-critical interface between the Mask unit and the VFUs
- Increase latency of the 16-bit multiplier from 0 to 1 to cut an in-lane timing-critical path
- Widen CVA6's cache lines
- Implement back-to-back accelerator instruction issue mechanism on CVA6
- Use https protocol when cloning DTC from main Makefile
- Use https protocol for newlib-cygwin in .gitmodules
- Cut a timing-critical path from Addrgen to Sequencer (1 cycle more to start an AXI transaction)
- Cut a timing-critical path in the
VSTU, relative to the calculation of the pointer to theVRFword received from the lanes - Create
ara_systemwrapper containing Ara, Ariane, and an AXI mux, instantiated from within Ara's SoC - Retime address calculation of the
addrgen - Push
MASKUoperand muxing from the lanes to the Mask Unit - Reduce CVA6's default cache size
- Update Verilator to
v4.214 - Update bender to
v0.23.1
v2.1.0
Fixed
- Fix calculation of
vstu's vector length - Fix
vslideupandvslidedownoperand's vector length trimming - Mute mask requests on idle lanes
- Mute instructions with vector length zero on the respective
lane_sequencerandoperand_requester - Fix
simd_div's offset calculation - Delay acknowledgment of memory requests if the
axi_inval_filteris busy
Added
- Format source files in the
appsfolder with clang-format by runningmake format - Support for the
2_lanes,8_lanes, and16_lanesconfigurations, besides the default4_lanesone
Changed
- Compile Verilator and Ara's verilated model with LLVM, for a faster compile time.
- Verilator updated to version v4.210.
- Verilation is done with a hierarchical verilation flow
- Replace
ara_soc's LLC with a simple main memory - Reduce number of words on the main memory, for faster Verilation
- Update
common_cellsto v1.22.1 - Update
axito v0.29.1
v2.0.0
Added
- Script to align all the elf sections to the AXI Data Width (the testbench requires it)
- RISC-V V intrinsics can now be compiled
- Add support for
vsetivli,vmv<nr>r.vinstructions - Add support for strided memory operations
- Add support for stores misaligned w.r.t. the AXI Data Width
Changed
- Alignment with lowRISC's coding guidelines
- Update Ara support for RISC-V V extension to V 0.10, with the exception of the instructions that were already missing
- Replace toolchain from GCC to LLVM when compiling for RISC-V V extension
- Update toolchain and SPIKE support to RISC-V V 0.10
- Patches for GCC and SPIKE are no longer required
- Ara benchmarks are now compatible with RISC-V V 0.10
Fixed
- Fix
vrf_seq_bytedefinition in the Load Unit - Fix check to discriminate a valid byte in the VRF word, in the Load Unit
- Fix
axi_addrgen_d.lencalculation in the Address Generation Unit - Correctly check whether the generated address corresponds to the vector load or the store unit
- Typos on the ChangeLog's dates
- Remove unwanted latches in the
addrgen,simd_div,instr_queue, anddecoder - Fix
vl == 0memory operations bug. Ara correctly tells Ariane that the memory operation is over
v1.2.0
Added
- Hardware support for:
- Vector slide instructions (vslideup, vslide1up, vfslide1up, vslidedown, vslide1down, vfslide1down)
- Software implementation of a integer 2D convolution kernel
- CI job to check the conv2d execution on Ara
Fixed
- Removed dependency to a specific gcc g++ version in Makefile
- Arithmetic and memory vector instructions with
vl == 0are considered as aNOP - Increment bit width of the vector length type (
vlen_t), accounting for vectors whose length isVLMAX - Fix vector length calculation for the
MaskBoperand, which depends onvsew - Fix typo on the
vrf_pntupdating logic at the Mask Unit - Update README to highlight dependency with Spike
- Update Bender's link dependency to the public CVA6 repository
- Retrigger the
compilemodule if the ModelSim compilation did not succeed
Changed
- The
encoding.hin the common Ara runtime is now a copy from theencoding.hin the Spike submodule
v1.1.1
Added
- Parametrization for FPU and FPU-specific formats support, through the
FPUSupportara_soc parameter
v1.1.0
1.1.0 - 2020-03-18
Added
- GitHub Actions-based CI
- Hardware support for:
- Vector single-width floating-point fused multiply-add instructions (vfnmacc, vfmsac, vfnmsac, vfnmadd, vfmsub, vfnmsub)
- Vector floating-point sign-injection instructions (vfsgnj, vfsgnjn, vfsgnjx)
- Vector widening floating-point add/subtract instructions (vfwadd, vfwsub, vfwadd.w, vfwsub.w)
- Vector widening floating-point multiply instructions (vfwmul)
- Vector widening floating-point fused multiply-add instructions (vfwmacc, vfwnmacc, vfwmsac, vfwnmsac)
- Vector floating-point merge instruction (vfmerge)
- Vector floating-point move instruction (vfmv)
Changed
- Contributing guidelines updated to include commit message and C++ code style guidelines
v1.0.0
Added
- Hardware support for:
- Vector single-width floating-point add/subtract instructions (vfadd, vfsub, vfrsub)
- Vector single-width floating-point multiply instructions (vfmul)
- Vector single-width floating-point fused multiply-add instructions (vfmacc, vfmadd)
- Vector single-width floating-point min/max instructions (vfmin, vfmax)
- Software implementation of a floating-point matrix multiplication kernel
v0.6.0
Added
- Support for a coherent mode between Ara and Ariane
- Snoop AW channel from Ara to L2
- Invalidate Ariane's L1 cache sets accordingly
- Coherent mode can be toggled together with consistent mode using the LSB of CSR 0x702
Changed
- Ariane's data cache is active by default
- The matrix multiplication kernel achieves better performance
- It reports the performance and the utilization for several matrix sizes
v0.5.0
Added
- Hardware support for:
- Vector single-width integer divide instructions (vdivu, vdiv, vremu, vrem)
- Vector integer comparison instructions (vmseq, vmsne, vmsltu, vmslt, vmsleu, vmsle, vmsgtu, vmsgt)
- Runtime measurement functions
- Consistent mode which orders scalar and vector loads/stores.
- Conservative ordering without address comparison
- Consistent mode is enabled per default, can be disabled by clearing the LSB of CSR 0x702.
Fixed
- Ariane's accelerator dispatcher module was rewritten, fixing a bug where instructions would get skipped.
- The Vector Store unit takes the EEW of the source vector register into account to shuffle the elements before writing them to memory.
Changed
- Vector mask instructions (vmand, vmnand, vmandnot, vmxor, vmor, vmnor, vmornot, vmxnor) no longer require the non-compliant constraint that the vector length is divisible by eight.
v0.4.0
Added
- Hardware compilation with Verilator
- Software implementation of a matrix multiplication kernel
Changed
- The
riscv_tests_simcMakefile target was deprecated. The riscv-tests are now run with the Verilated design, which can be called through theriscv_tests_simvMakefile target. - The operand queues now take as a parameter the type conversions they support (currently,
SupportIntExt2,SupportIntExt4, andSupportIntExt8) - The Vector Multiplier unit now has independent pipelines for each element width.