Skip to content

Conversation

@jishnub
Copy link
Member

@jishnub jishnub commented Apr 3, 2025

The idea is that if alpha is known to be non-zero and a Bool, it must be true. We may therefore hardcode the value to reduce the branches in @stable_muladdmul.

TTFX:

julia> using LinearAlgebra

julia> D = Diagonal(1:4); A = zeros(4,4);

julia> @time A * D;
  0.079938 seconds (139.62 k allocations: 6.952 MiB, 99.92% compilation time) # master
  0.058087 seconds (126.77 k allocations: 6.290 MiB, 99.88% compilation time) # this PR

The TTFX in D * A does not change by much, but the allocations go down.

julia> @time D * A;
  0.062484 seconds (176.66 k allocations: 8.696 MiB, 99.91% compilation time) # master
  0.059009 seconds (133.34 k allocations: 6.572 MiB, 99.91% compilation time) # this PR

@codecov
Copy link

codecov bot commented Apr 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.00%. Comparing base (e53b50c) to head (e298921).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1256      +/-   ##
==========================================
- Coverage   92.03%   92.00%   -0.04%     
==========================================
  Files          34       34              
  Lines       15459    15463       +4     
==========================================
- Hits        14227    14226       -1     
- Misses       1232     1237       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jishnub jishnub merged commit b1bcca1 into master Apr 3, 2025
3 of 4 checks passed
@jishnub jishnub deleted the jishnub/diagmul_alphabool branch April 3, 2025 16:06
jishnub added a commit that referenced this pull request Apr 26, 2025
Similar to #1256, we
may reduce branches in `@stable_muladdmul` for a non-zero `Bool` alpha
in `Bidiagonal` matmul. The idea is that if `alpha::Bool` is non-zero,
it must be `true`. We may therefore hardcode this value and reduce
branches in `@stable_muladdmul`. In addition, if `beta` is unused in a
method, we may hardcode `beta = false` as well, which further helps with
compilation.

```julia
julia> using LinearAlgebra

julia> B = Bidiagonal(1:4, 1:3, :U); D = Diagonal(1:4); v = (1:4);
```
With this,
```julia
julia> @time B * B;
  0.406036 seconds (2.34 M allocations: 105.696 MiB, 4.31% gc time, 99.99% compilation time) # nightly
  0.141480 seconds (588.18 k allocations: 26.049 MiB, 99.95% compilation time) # This PR
```
The rest are mainly reductions in allocation:
```julia
julia> @time B * D;
  0.141903 seconds (487.43 k allocations: 24.557 MiB, 99.96% compilation time) # nightly
  0.147749 seconds (382.60 k allocations: 19.385 MiB, 99.96% compilation time) # this PR
```
```julia
julia> @time D * B;
  0.136308 seconds (491.83 k allocations: 24.782 MiB, 99.95% compilation time) # nightly
  0.136909 seconds (386.35 k allocations: 19.591 MiB, 99.94% compilation time) # this PR
```
```julia
julia> @time B * v;
  0.087207 seconds (428.64 k allocations: 21.620 MiB, 99.93% compilation time) # master
  0.089002 seconds (342.46 k allocations: 17.306 MiB, 99.92% compilation time) # this PR
```

This also improves performance for small `Bidiagonal` multiplication, as
there are fewer operations to carry out.
```julia
julia> n = 1; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T);

julia> @Btime mul!($C, $T, $T);
  33.360 ns (0 allocations: 0 bytes) # nightly
  23.773 ns (0 allocations: 0 bytes) # this PR

julia> n = 2; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T));

julia> @Btime mul!($C, $T, $T);
  78.685 ns (0 allocations: 0 bytes) # nightly
  31.388 ns (0 allocations: 0 bytes) # this PR

julia> n = 3; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T);

julia> @Btime mul!($C, $T, $T);
  161.577 ns (0 allocations: 0 bytes) # nightly
  41.256 ns (0 allocations: 0 bytes) # this PR
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants