You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scaling loop instead of broadcasting in strided matrix exp (#56463)
Firstly, this is easier to read. Secondly, this merges the two loops
into one. Thirdly, this avoids the broadcasting latency.
```julia
julia> using LinearAlgebra
julia> A = rand(2,2);
julia> @time LinearAlgebra.exp!(A);
0.952597 seconds (2.35 M allocations: 116.574 MiB, 2.67% gc time, 99.01% compilation time) # master
0.877404 seconds (2.17 M allocations: 106.293 MiB, 2.65% gc time, 99.99% compilation time) # this PR
```
The performance also improves as there are fewer allocations in the
first branch (`opnorm(A, 1) <= 2.1`):
```julia
julia> B = diagm(0=>im.*(float.(1:200))./200, 1=>(1:199)./400, -1=>(1:199)./400);
julia> opnorm(B,1)
1.9875
julia> @Btime exp($B);
5.066 ms (30 allocations: 4.89 MiB) # nightly v"1.12.0-DEV.1581"
4.926 ms (27 allocations: 4.28 MiB) # this PR
```
0 commit comments