-
Notifications
You must be signed in to change notification settings - Fork 1.6k
LoongArch64: fixed cscal and zscal #5078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch64: fixed cscal and zscal #5078
Conversation
|
I wonder if this will lead us down the same path of adding a special flag for array zeroing vs IEEE compliance as with non-complex SCAL :( |
cb8cc3f to
038e0fb
Compare
|
You reminded me that we also need to add flags for cscal and zscal. Test output using MKL 2024.2 version: The same output as MKL when using reference BLAS Version 3.12.0. |
|
This PR will introduce new issues to s/zscal and needs to be revised. (It seems that other platforms also need modifications to avoid the above issues.) |
|
I submitted a PR #5081 attempting to fix the implementation in C. |
|
Thanks, I'll try to take a stab at the other implementations over the weekend. |
038e0fb to
6b27f17
Compare
|
Unfortunately this appears to have broken most of the pre-existing special handling of |
|
I removed some special-case handling code because I felt its correctness was questionable. The output is: |
6b27f17 to
2da86b8
Compare
|
Lines 61 to 67 in 76db346
Adding special value checks for each number is likely to cause a performance drop and make optimization with assembly much more difficult. Should we consider simplifying it? |
|
(spurious merge conflict was caused by pengxu's #5248 which contained your changes to cscal_lasx.c - sorry for not noticing this at the time I merged that PR) |
|
No problem, Just wanted to check if you'd like me to revert the merged changes in my PR. |
|
I think it is all sorted, thank you. |
For the parameters
float x[2] = {NaN, NaN}andfloat alpha[2] = {0.0, 0.0}, the optimizedcscalinterface does not directly copy0.0toxbut continues performing complex multiplication, resulting in an output of{NaN, NaN}.The optimized
zscalhas the same issue. This problem was detected in LAPACK tests, but the existing OpenBLAS test cases do not cover this scenario. It may be considered for inclusion in future test cases.