Skip to content

Conversation

penelopeysm
Copy link

This PR (sort of) closes #1989. I didn't attempt to make rand() faster because the performance drop there was smaller, but the changes here improve the performance of logpdf on DiagNormal and IsoNormal by about 2–3x.

The offending code that made the previous logpdf slower was the calculation of x .- d.µ:

sqmahal(d::MvNormal, x::AbstractVector) = invquad(d.Σ, x .- d.μ)

The new methods avoid this and are thus both faster and avoid allocations.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.30%. Comparing base (abb151c) to head (0b592c7).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1991      +/-   ##
==========================================
+ Coverage   86.28%   86.30%   +0.02%     
==========================================
  Files         146      146              
  Lines        8787     8801      +14     
==========================================
+ Hits         7582     7596      +14     
  Misses       1205     1205              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just use

sqmahal(d::MvNormal, x::AbstractVector) = invquad(d.Σ, LazyArrays.@~(x .- d.μ))

as LazyArrays seems to optimize FillArrays broacasts as well, so we might be able to keep the optimizations for Zeros means.

@@ -261,6 +261,25 @@ logdetcov(d::MvNormal) = logdet(d.Σ)

sqmahal(d::MvNormal, x::AbstractVector) = invquad(d.Σ, x .- d.μ)

function sqmahal(d::DiagNormal, x::AbstractVector)
# Faster than above as this avoids calculating (x .- d.µ)
T = promote_type(partype(d), eltype(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct eg if the parameters and x are arrays of integers. One could either compute the first element outside of the loop or possibly use a functional approach (e.g. with mapreduce and Base.Broadcast.broadcasted(...)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functional approach probably also works better with GPU arrays.

end

function sqmahal(d::IsoNormal, x::AbstractVector)
T = promote_type(partype(d), eltype(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very artificial, but this is also not guaranteed to be correct e.g. for Boolean values.

Comment on lines +268 to +269
for i in eachindex(x)
@inbounds sum += abs2(x[i] - d.μ[i]) / d.Σ[i, i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indices might not be compatible.

T = promote_type(partype(d), eltype(x))
sum = zero(T)
for i in eachindex(x)
@inbounds sum += abs2(x[i] - d.μ[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indices might not be compatible

for i in eachindex(x)
@inbounds sum += abs2(x[i] - d.μ[i])
end
return sum / d.Σ[1, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not use one-based indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DiagNormal is substantially slower than Product of Normals
3 participants