Skip to content

Conversation

@jishnub
Copy link
Member

@jishnub jishnub commented Apr 3, 2025

Similar to #1256, we may reduce branches in @stable_muladdmul for a non-zero Bool alpha in Bidiagonal matmul. The idea is that if alpha::Bool is non-zero, it must be true. We may therefore hardcode this value and reduce branches in @stable_muladdmul. In addition, if beta is unused in a method, we may hardcode beta = false as well, which further helps with compilation.

julia> using LinearAlgebra

julia> B = Bidiagonal(1:4, 1:3, :U); D = Diagonal(1:4); v = (1:4);

With this,

julia> @time B * B;
  0.406036 seconds (2.34 M allocations: 105.696 MiB, 4.31% gc time, 99.99% compilation time) # nightly
  0.141480 seconds (588.18 k allocations: 26.049 MiB, 99.95% compilation time) # This PR

The rest are mainly reductions in allocation:

julia> @time B * D;
  0.141903 seconds (487.43 k allocations: 24.557 MiB, 99.96% compilation time) # nightly
  0.147749 seconds (382.60 k allocations: 19.385 MiB, 99.96% compilation time) # this PR
julia> @time D * B;
  0.136308 seconds (491.83 k allocations: 24.782 MiB, 99.95% compilation time) # nightly
  0.136909 seconds (386.35 k allocations: 19.591 MiB, 99.94% compilation time) # this PR
julia> @time B * v;
  0.087207 seconds (428.64 k allocations: 21.620 MiB, 99.93% compilation time) # master
  0.089002 seconds (342.46 k allocations: 17.306 MiB, 99.92% compilation time) # this PR

This also improves performance for small Bidiagonal multiplication, as there are fewer operations to carry out.

julia> n = 1; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T);

julia> @btime mul!($C, $T, $T);
  33.360 ns (0 allocations: 0 bytes) # nightly
  23.773 ns (0 allocations: 0 bytes) # this PR

julia> n = 2; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T));

julia> @btime mul!($C, $T, $T);
  78.685 ns (0 allocations: 0 bytes) # nightly
  31.388 ns (0 allocations: 0 bytes) # this PR

julia> n = 3; T = Bidiagonal(ones(n), ones(max(n-1,0)), :U); C = Matrix(T);

julia> @btime mul!($C, $T, $T);
  161.577 ns (0 allocations: 0 bytes) # nightly
  41.256 ns (0 allocations: 0 bytes) # this PR

@codecov
Copy link

codecov bot commented Apr 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.15%. Comparing base (27de9e9) to head (cb026ae).
Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1257      +/-   ##
==========================================
- Coverage   92.15%   92.15%   -0.01%     
==========================================
  Files          34       34              
  Lines       15604    15645      +41     
==========================================
+ Hits        14380    14417      +37     
- Misses       1224     1228       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jishnub jishnub force-pushed the jishnub/bidiag_alphabool branch from 68ece4d to df7fef1 Compare April 4, 2025 09:48
@jishnub jishnub added arrays [a, r, r, a, y, s] ttfx The change pertains to first-call latency labels Apr 5, 2025
@jishnub jishnub requested a review from dkarrasch April 8, 2025 08:30
@jishnub jishnub force-pushed the jishnub/bidiag_alphabool branch from 597cc18 to 3a9e3db Compare April 11, 2025 17:20
@jishnub jishnub added the performance Must go faster label Apr 12, 2025
@jishnub jishnub force-pushed the jishnub/bidiag_alphabool branch from f99cb9e to cb026ae Compare April 22, 2025 12:01
@jishnub jishnub merged commit 5d3d02a into master Apr 26, 2025
4 checks passed
@jishnub jishnub deleted the jishnub/bidiag_alphabool branch April 26, 2025 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrays [a, r, r, a, y, s] performance Must go faster ttfx The change pertains to first-call latency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant