- 
                Notifications
    
You must be signed in to change notification settings  - Fork 255
 
Open
Description
The 5-arg mul!(C, A, B, α, β) returns the wrong result if A and B are empty matrices (of correct sizes). Here's a MWE:
julia> using CUDA
julia> using LinearAlgebra
julia> A = rand(3, 3);
julia> mul!(A, zeros(3, 0), zeros(0, 3), 1.0, 1.0)
3×3 Matrix{Float64}:
 0.752731   0.43537   0.830489
 0.0522754  0.915373  0.462034
 0.319501   0.120835  0.769065
julia> mul!(cu(A), CUDA.zeros(3, 0), CUDA.zeros(0, 3), 1.0, 1.0)
3×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
If any of A or B have 0 dimensions, the matmul instead hits:
Line 359 in 64df292
| return LinearAlgebra.rmul!(C, 0) | 
and zeros out all values in C. This should be
LinearAlgebra.rmul!(C, beta).
However, relying on rmul! to zero out uninitialized arrays is insufficient, because the CUDA implementation of rmul! (scal) does not respect false as a strong zero (#2607). E.g. NaN's in the uninitialized memory would remain. The 3-arg mul! and thereby also the * operator assume correct "strong zero" behavior.
A clean way to fix the rmul! would be to rely on the GPUArrays.jl implementation and avoid using scal! from the cuda libraries. Would this affect performance?
Metadata
Metadata
Assignees
Labels
No labels