Skip to content

Conversation

@albertomercurio
Copy link
Contributor

This PR fixes #246 and the remaining allocations of #247.

This is related to this issue in the SparseArrays.jl package.

@albertomercurio
Copy link
Contributor Author

Ok, I'm done.

@albertomercurio
Copy link
Contributor Author

I've also noticed that the time-dependent case is not still fixed. As an example

T = ComplexF64
N = 500
M = 10
# A1 = MatrixOperator(rand(T, N, N))
# A2 = MatrixOperator(rand(T, N, N))
A1 = MatrixOperator(sprand(T, N, N, 5 / N))
A2 = MatrixOperator(sprand(T, N, N, 5 / N))
A3 = MatrixOperator(sprand(T, N, N, 5 / N))

coeff1(a, u, p, t) = sin(p.ω * t)
coeff2(a, u, p, t) = cos(p.ω * t)
coeff3(a, u, p, t) = sin(p.ω * t) * cos(p.ω * t)

c1 = ScalarOperator(rand(T), coeff1)
c2 = ScalarOperator(rand(T), coeff2)
c3 = ScalarOperator(rand(T), coeff3)

H = c1 * A1 + c2 * A2 + c3 * A3

u = rand(T, N);
du = similar(u);
p ==0.1,);
t = 0.1;

@benchmark $H($du, $u, $p, $t)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.964 μs …  10.530 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.045 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.079 μs ± 211.087 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂▆▇██▇▆▅▃▂▁           ▁▁▁ ▁       ▁▁▁▁                 ▂
  ▄▁▄▃▁█████████████▆▆▅▆▆▆▆▆█████████▆▇▇██████▇▅▇▆▅▆▄▄▄▅▅▃▄▁▄ █
  4.96 μs      Histogram: log(frequency) by time      5.48 μs <

 Memory estimate: 752 bytes, allocs estimate: 8.

And the number of the allocations scales as the number of operators in the sum. I actually don't understand what is the problem.

Anyhow, the time-independent code in #246 is fixed.

@albertomercurio
Copy link
Contributor Author

Ok, also the time-dependent case should be fixed. At least for the case I studied. I basically followed the discussion on this thread, where they recommended to use generated functions when dealing with tuples of different types.

Indeed, the memory allocations are absent now.

@albertomercurio albertomercurio changed the title Improve mul! to remove memory allocations Improve mul!, AddedOperator, and update_coefficients! to remove memory allocations Oct 15, 2024
end

function LinearAlgebra.mul!(v::AbstractVecOrMat,
@inline function LinearAlgebra.mul!(v::AbstractVecOrMat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related to this issue in SparseArrays.jl. We currently need it to avoid extra allocations.

Comment on lines -459 to +511
for op in L.ops
iszero(op) && continue
mul!(v, op, u, α, true)
ops_types = L.parameters[2].parameters
N = length(ops_types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AddedOperator contains a Tuple of different objects. Performing a simple for loop would lead to runtime dispatch and extra allocations. Following this thread they recommended to use @generated functions. In this way, we directly work with the single types of the Tuple, and we unroll the for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A recursive implmeentation is probably cleaner and would get more compilation reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly?

@ChrisRackauckas
Copy link
Member

Needs tests.

@albertomercurio
Copy link
Contributor Author

I didn't add any new functionality, so the current tests should be fine. This was just to remove extra allocations and improve performance. But if you want to add some test, do you have something specific in mind?

@ChrisRackauckas
Copy link
Member

A test for zero allocations so it doesn't regress

@albertomercurio
Copy link
Contributor Author

Ok, I added the tests. It seems that @allocations returns 1 instead of 0 like with BenchmarkTools.jl. @allocations returns 0 outside of the test.

Anyways, the number of allocations of the current master branch is of the order of hundreds.

@ChrisRackauckas
Copy link
Member

It's allocating a return because of the global scope IIUC. Wrap it in a function that then returns nothing.

@albertomercurio
Copy link
Contributor Author

Still the same. Outside the tests it returns 0 allocations with both @allocations and BenchmarkTools.jl. Inside the test it returns 1.

@ChrisRackauckas
Copy link
Member

@oscardssmith rfc

@oscardssmith
Copy link
Member

This happens because apply_op! is a a dispatch so it's allocating for the method call. It does seem to me that this might be a bug in the @allocations macro, but I'm not entirely sure. The easiest way to get around it is to something like

test_apply_noalloc(H, du, u, p, t) = @test @allocations apply_op!(H, du, u, p, t) ==0

since that will introduce a function barrier.

@albertomercurio
Copy link
Contributor Author

Ok, now it returns 0 allocations even in the runtests.

@ChrisRackauckas ChrisRackauckas merged commit 7180a9e into SciML:master Oct 17, 2024
5 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AddedOperator of Sparse matrices allocates memory

3 participants