Skip to content

Conversation

@tingboliao
Copy link

@tingboliao tingboliao commented Jan 15, 2025

The implementation of the original gemm_tcopy_8_rvv, when the vector length (vlen) is 128 and 256,
causes some sgemmt/dgemmt series cases failure when running openblas_utest_ext.
The optimized version can pass the functional tests with the various vector lengths such as 128, 256, 512, and 1024.

Furthermore, for the relevant cases in the benchmark, the further optimized version has better performance on the K230 [C908, vlen = 128] and K1 [C908, vlen = 256], compared with the scalar and original optimized versions.
The performance data are shown as below:

Parameter setting: OPENBLAS_LOOPS = 10000.

1. K230 [C908, vlen = 128]:

Case Scalar / MFlops Original RVV / MFlops Optimized RVV / MFlops
ssyr2k.goto 1277.22 3596.13 3865.84
dsyr2k.goto 1027.01 1802.04 1845.73
ssyrk.goto 1283.67 3271.49 3308.92
dsyrk.goto 995.20 1578.81 1603.34

2. K1 [C908, vlen = 256]:

Case Scalar / MFlops Original RVV / MFlops Optimized RVV / MFlops
ssyr2k.goto 1255.74 5044.36 5953.40
dsyr2k.goto 1000.17 2552.43 2649.93
ssyrk.goto 1289.06 4631.15 4977.29
dsyrk.goto 1010.85 2204.08 2230.21

In the above data, the bigger value is, the better performance is.

@tingboliao tingboliao changed the title Optimize the zgemm_tcopy_4_rvv to be compatible with the vlens 128 and 256. Optimize the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256. Jan 15, 2025
@martin-frbg martin-frbg added this to the 0.3.30 milestone Jan 16, 2025
@martin-frbg martin-frbg merged commit a7483d1 into OpenMathLib:develop Jan 16, 2025
83 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants