LoongArch64: fixed cscal and zscal #5078

XiWeiGu · 2025-01-16T12:41:37Z

For the parameters float x[2] = {NaN, NaN} and float alpha[2] = {0.0, 0.0}, the optimized cscal interface does not directly copy 0.0 to x but continues performing complex multiplication, resulting in an output of {NaN, NaN}.
The optimized zscal has the same issue. This problem was detected in LAPACK tests, but the existing OpenBLAS test cases do not cover this scenario. It may be considered for inclusion in future test cases.

martin-frbg · 2025-01-16T13:14:37Z

I wonder if this will lead us down the same path of adding a special flag for array zeroing vs IEEE compliance as with non-complex SCAL :(

XiWeiGu · 2025-01-17T06:24:15Z

You reminded me that we also need to add flags for cscal and zscal.
For the following code:

#include <stdio.h>
#include "cblas.h"
#include <complex.h>
int main() {
    // 向量长度
    int N = 1;

    // 复数缩放因子 alpha
    float alpha[2] = {0.0, 0.0}; 

    // 复数向量 X
    float X[2] = {NAN, NAN};

    // 打印缩放前的向量
    printf("Before scaling:\n");
    for (int i = 0; i < N; i++) {
        printf("X[%d] = %f + %f\n", i, X[i], X[i + 1]);
    }

    // 缩放向量
    cblas_cscal(N, alpha, X, 1);

    // 打印缩放后的向量
    printf("\nAfter scaling:\n");
    for (int i = 0; i < N; i++) {
        printf("X[%d] = %f + %f\n", i, X[i], X[i + 1]);
    }

    return 0;
}

Test output using MKL 2024.2 version：

Before scaling:
X[0] = nan + nan

After scaling:
X[0] = nan + nan

The same output as MKL when using reference BLAS Version 3.12.0.
Test output using OpenBLAS V0.3.29(TARGET=HASWELL)：

Before scaling:
X[0] = nan + nan

After scaling:
X[0] = 0.000000 + 0.000000

XiWeiGu · 2025-01-17T06:29:04Z

This PR will introduce new issues to s/zscal and needs to be revised. (It seems that other platforms also need modifications to avoid the above issues.)

XiWeiGu · 2025-01-17T08:33:41Z

I submitted a PR #5081 attempting to fix the implementation in C.

martin-frbg · 2025-01-17T11:30:14Z

Thanks, I'll try to take a stab at the other implementations over the weekend.

martin-frbg · 2025-01-21T16:44:51Z

Unfortunately this appears to have broken most of the pre-existing special handling of 0*NAN and 0*INF cases (which mostly covered only one of the real and imaginary components being "special") ?

XiWeiGu · 2025-01-22T04:11:06Z

I removed some special-case handling code because I felt its correctness was questionable.
Test the following code using MKL 2024.02:

#include <stdio.h>
#include <mkl_cblas.h>
#include <complex.h>
#include <string.h>

int main() {
    int i;
    int N = 17;
    int incX = 1;
    int incY = 1;
    float alpha[2] = {0.0, 0.0};
    float beta[2] = {2.0, 0.0};
    char  trans = 'N';
    float A[17 * 17 * 4];
    float X[17 * 2];
    float Y[17 * 2];

    memset(A, 0, sizeof(A));
    memset(X, 0, sizeof(X));
    for (i = 0; i < (2 * N - 2); i += 4)
    {
        Y[i]     = NAN;
        Y[i + 1] = 1.0;

        Y[i + 2] = INFINITY;
        Y[i + 3] = 1.0;
    }
    Y[2 * N - 2] = NAN;
    Y[2 * N - 1] = 1.0;
    cblas_cgemv(CblasColMajor, CblasNoTrans, N, N, alpha, A, N, X, incX, beta, Y, incY);
    for (i = 0; i < 2 * N; i += 2) {
            printf("%f, %f, ", Y[i], Y[i+1]);
    }
}

The output is:
nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan, inf, -nan, nan, nan
The same output as MKL when using reference BLAS Version 3.12.0.
However, when using OpenBLAS v0.3.29 (TARGET=GENERIC, using the C kernel), the output result is:
nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000, inf, 2.000000, nan, 2.000000
I think for the upper-level interfaces that call c/zscal_k (excluding c/zscal), we should also no longer reduce computations.

XiWeiGu · 2025-01-22T09:51:37Z

OpenBLAS/kernel/arm/zscal.c

Lines 61 to 67 in 76db346

    
           { 
        
           	temp = - da_i * x[ip+1] ; 
        
           	if (isnan(x[ip]) || isinf(x[ip])) temp = NAN; 
        
           	if (!isinf(x[ip+1]))  
        
           		x[ip+1] = da_i * x[ip]  ; 
        
           	else 	x[ip+1] = NAN; 
        
           }

Adding special value checks for each number is likely to cause a performance drop and make optimization with assembly much more difficult. Should we consider simplifying it?

martin-frbg · 2025-06-11T21:36:56Z

(spurious merge conflict was caused by pengxu's #5248 which contained your changes to cscal_lasx.c - sorry for not noticing this at the time I merged that PR)

XiWeiGu · 2025-06-12T02:35:04Z

No problem, Just wanted to check if you'd like me to revert the merged changes in my PR.

martin-frbg · 2025-06-12T08:02:10Z

I think it is all sorted, thank you.

XiWeiGu changed the title ~~La64 fixed cscal zscal~~ LoongArch64: fixed cscal and zscal Jan 16, 2025

XiWeiGu force-pushed the la64_fixed_cscal_zscal branch from cb8cc3f to 038e0fb Compare January 17, 2025 01:41

XiWeiGu force-pushed the la64_fixed_cscal_zscal branch from 038e0fb to 6b27f17 Compare January 20, 2025 06:37

LoongArch64: Fixed LSX version of cscal and zscal

b2117bb

XiWeiGu added 2 commits January 22, 2025 14:15

LoongArch64: Fixed LASX version of cscal and zscal

5392f6d

LoongArch64: Fixed scalar version of cscal and zscal

2da86b8

XiWeiGu force-pushed the la64_fixed_cscal_zscal branch from 6b27f17 to 2da86b8 Compare January 22, 2025 06:33

This was referenced May 23, 2025

kernel/x86_64: fixed cscal and zscal #5280

Merged

kernel/arm64: fixed cscal and zscal #5281

Merged

Merge branch 'develop' into la64_fixed_cscal_zscal

f4194fc

martin-frbg added this to the 0.3.30 milestone Jun 11, 2025

martin-frbg merged commit 2e2691b into OpenMathLib:develop Jun 12, 2025
90 of 93 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoongArch64: fixed cscal and zscal #5078

LoongArch64: fixed cscal and zscal #5078

Uh oh!

XiWeiGu commented Jan 16, 2025

Uh oh!

martin-frbg commented Jan 16, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

martin-frbg commented Jan 17, 2025

Uh oh!

martin-frbg commented Jan 21, 2025

Uh oh!

XiWeiGu commented Jan 22, 2025

Uh oh!

XiWeiGu commented Jan 22, 2025

Uh oh!

martin-frbg commented Jun 11, 2025

Uh oh!

XiWeiGu commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LoongArch64: fixed cscal and zscal #5078

LoongArch64: fixed cscal and zscal #5078

Uh oh!

Conversation

XiWeiGu commented Jan 16, 2025

Uh oh!

martin-frbg commented Jan 16, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

XiWeiGu commented Jan 17, 2025

Uh oh!

martin-frbg commented Jan 17, 2025

Uh oh!

martin-frbg commented Jan 21, 2025

Uh oh!

XiWeiGu commented Jan 22, 2025

Uh oh!

XiWeiGu commented Jan 22, 2025

Uh oh!

martin-frbg commented Jun 11, 2025

Uh oh!

XiWeiGu commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants