Skip to content

Conversation

@xiazhuozhao
Copy link
Contributor

@xiazhuozhao xiazhuozhao commented Jan 2, 2026

Description

This PR adds support for FP16 (half-precision) data type to the RISC-V Vector (RVV) Softmax primitive.
Added compute_softmax_f16_rvv which utilizes Zvfh intrinsics. To ensure precision, the kernel performs accumulation and exponential/logarithm calculations in f32 (using widening conversions __riscv_vfwcvt) and converts back to f16 for the final output.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?

Performance was evaluated on SG2044 (RISC-V) using 16 cores (pinned). The comparison is between the new RVV f16 implementation and the previous behavior (fallback to reference).

./benchdnn --mode=P --softmax --batch=test_softmax_float16

(Note: Only collected FWD_D cases)

speedup_ratio.csv
with_f16_softmax.csv
without_f16_softmax.csv

Average Speedup: ~23.40x

@xiazhuozhao xiazhuozhao force-pushed the f16_softmax branch 2 times, most recently from 2481c54 to 066e7e7 Compare January 2, 2026 17:21
@xiazhuozhao xiazhuozhao marked this pull request as ready for review January 7, 2026 11:46
@xiazhuozhao xiazhuozhao requested a review from a team as a code owner January 7, 2026 11:46
xiazhuozhao and others added 2 commits January 14, 2026 00:49
Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>
Co-authored-by: Fei Zhang <zhangfei@iscas.ac.cn>
@zhangjian29 zhangjian29 merged commit 97eb3d8 into uxlfoundation:main Jan 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants