Skip to content

Conversation

@jishnub
Copy link
Member

@jishnub jishnub commented Apr 14, 2025

By dealing with the alpha == 0 case separately, we ensure that if alpha::Bool, it must be true. This reduces the branches in _lscale_add from 4 to 2 in the common case of 3-argument mul!. This leads to a latency reduction, as each branch has to compile a different broadcast expression, and we currently compile four but use only one. Primarily, this PR leads to a reduction in allocations.

julia> using LinearAlgebra

julia> v = 1:4; w = similar(v);

julia> @time mul!(w, 1, v);
  0.171120 seconds (1.04 M allocations: 52.799 MiB, 99.98% compilation time) # nightly
  0.163178 seconds (702.63 k allocations: 35.533 MiB, 99.98% compilation time) # this PR

Something similar usually doesn't lead to a big gain in the _rscale_add method, as s * alpha often has the same type as s, and therefore the branches on alpha compile the same code.

@jishnub jishnub force-pushed the jishnub/mul_lscale_alphabool branch from 4153569 to 613935b Compare April 14, 2025 17:36
@codecov
Copy link

codecov bot commented Apr 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.11%. Comparing base (830ea2f) to head (b937fae).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1286      +/-   ##
==========================================
+ Coverage   92.03%   92.11%   +0.08%     
==========================================
  Files          34       34              
  Lines       15499    15507       +8     
==========================================
+ Hits        14265    14285      +20     
+ Misses       1234     1222      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jishnub jishnub added the ttfx The change pertains to first-call latency label Apr 19, 2025
@jishnub jishnub merged commit 07725da into master Apr 20, 2025
4 checks passed
@jishnub jishnub deleted the jishnub/mul_lscale_alphabool branch April 20, 2025 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ttfx The change pertains to first-call latency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants