Performance is being compromised in RISC-V GEMV due to manual 4-way unrolling and use of LMUL > 2. Using these 2 conditions together can cause vector registers to be spilled. The inner loops of gemv_n_vector and gemv_t_vector should be reworked to avoid this.
FYI - there are only 32 vector registers. LMUL = 8 uses 8 vector registers per variable.
Testing should be done with RVV 1.0 for RISCV64_ZVL128B and RISCV64_ZVL256B.
5427
5211